Mutations in SARS-CoV-2 spike protein and RNA polymerase complex are associated with COVID-19 mortality risk

doi:10.21203/rs.3.rs-95183/v1

Download PDF

Biological Sciences - Article

Mutations in SARS-CoV-2 spike protein and RNA polymerase complex are associated with COVID-19 mortality risk

https://doi.org/10.21203/rs.3.rs-95183/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

SARS-CoV-2 mortality has been extensively studied in relationship to a patient's predisposition to the disease. However, how sequence variations in the SARS-CoV-2 genome affect mortality is not understood. To address this issue, we used a whole-genome sequencing (WGS) association study to directly link death of SARS-CoV-2 patients with sequence variation in the viral genome. Specifically, we analyzed 3,626 single stranded RNA-genomes of SARS-CoV-2 patients in the GISAID database (Elbe and Buckland-Merrett, 2017; Shu and McCauley, 2017) with reported patient’s health status from COVID-19, i.e. deceased versus non-deceased. In total, evaluating 28,492 loci of the viral genome for association with patient/host mortality, two loci, 12,053bp and 25,088bp, achieved genome-wide significance (p-values of 1.24e-12, and 1.24e-26, respectively). Mutations at 25,088bp occur in the S2 subunit of the SARS-CoV-2 spike protein, which plays a key role in viral entry of target host cells. Additionally, mutations at 12,053bp are within the ORF1ab gene, in a region encoding for the protein nsp7, which is necessary to form the RNA polymerase complex responsible for viral replication and transcription. Both mutations altered amino acid coding sequences, potentially imposing structural changes that could enhance viral infectivity and symptom severity, and may be important to consider as targets for therapeutic development.

SARS-CoV-2 spike protein

genome

mutations

RNA polymerase complex

Viral mutations can cause increased virulence/pathogenicity (Long et al., 2020), both in animals (Geoghegan and Holmes, 2018; Brault et al., 2007), and in humans (Bae et al., 2018; Nogales et al., 2017). Especially for the SARS-CoV-2 virus, the discovery of potential links between viral mutations and disease outcome would have important implications for COVID-19 diagnosis, prognosis and treatment development. To identify potential links between viral mutations and mortality, we utilized the GISAID database (Elbe and Buckland-Merrett, 2017; Shu and McCauley, 2017), which currently contains data on 3,626 COVID-19 patients from 68 countries, for whom full metadata is available, i.e. age, sex, location and patient status, and whose viral genomes have been sequenced (see Table 1) and probed each locus of the single stranded RNA of the SARS-CoV-2 virus for direct association with host/patient mortality. The variable patient status indicates if the patient was alive or deceased at the time the virus sample was submitted to GISAID; we use it as a surrogate for mortality in our analysis. For the analysis, we repurposed the methodology of genome-wide association studies (GWAS) (Manolio, 2010). This approach is widely used in human genetics and can test thousands of genetic loci for association in datasets such as GISAID.

To identify potential confounding geographic factors in the sequencing data, we first conducted principal component analysis of the Jaccard similarity matrix (Figure 1) that was computed for the 3,626 viral genomes available for our analysis. We utilized the Jaccard similarity matrix because its computation does not require estimates of the mutation frequency for each locus in the SARS-CoV-2 genome, in contrast to other similarity matrices such as the variance/covariance matrix (Prokopenko et al., 2016). We found that the virus genomes clustered in distinctive branches that correspond to the geographic regions from where their data was submitted to GISAID (Forster et al., 2020) (Figure 1). The geographical clustering of the viral genomes can cause bias in the association analysis if unaccounted for. Hence, we generated additional eigenvector plots to investigate the number of eigenvectors needed to eliminate bias caused by such clustering. Based on visual inspection of these plots, we selected the first 5 eigenvectors of the Jaccard matrix as covariates for the following logistic regression analyses.

In the whole-genome sequencing analysis of the 3,626 SARS-CoV-2 viruses, we tested each locus (presence/absence of mutation) of the viral genome individually for association with the status indicator variable “deceased/ non-deceased” of the host/patient at submission to GISAID. In the logistic regression, we also adjusted for sex, age, and the first 5 eigenvectors of the Jaccard matrix. The qq-plot of the p-values for the association tests are displayed in Fig. 2. The significance level of 5% is controlled for multiple testing using Bonferroni correction, resulting in the corrected threshold of 0.05/28,492 = 1.75e-6. Based upon this correction, two loci of the SARS-CoV-2 genome achieved genome-wide significance: one at position 12,053 bp on the SARS-CoV-2 reference genome having a p-value of 1.24e-12, and one at 25,088 bp with a p-value of 1.24e-26. (Fig. 2).

To investigate the robustness of the highly significant association signals, we examined the dataset at the individual patient and locus level. Our findings were enabled by two features specific to the data: 1.) the Brazilian centers enrolled much larger numbers of deceased patients than the other centers world-wide. At enrollment, 45% of the Brazilian patients were deceased in contrast to only 9.9% in the entire dataset. 2.) We also noticed that all genomes that carry at least one of the mutations either at 12,053 bp or 25,088 bp are located predominantly in those branches of the eigenvector plot (see Fig. 1) that correspond to the PAHO/South America region, to Asia, or to Europe.

We conducted three different types of sensitivity analysis to minimize potential confounding (Table 2): 1. “Geographic region”, as indicated in Table 1, was added as a categorical variable to the logistic regression analysis. 2. Our data set was restricted to genomes that were matched based proximity in the eigenvector plots (see Methods for details). 3. As further examination of the deceased indictor variable revealed that all “deceased” carrier genomes came from Brazil, our analysis was restricted to genomes that were submitted from Brazil. In all three secondary analyses, both loci remained significant (Table 2). For both loci, the effect size estimates of the mutations showed risk increases for mortality of 5-fold and higher (Table 2). Furthermore, as the association signals for both loci stem from the Brazilian Covid-19 patients, we also obtained Fisher-exact tests for the data from Brazil overall and from Sao Paulo alone (Table 3). Sao Paulo is the only Brazilian center/city for that sufficient numbers were available to compute the Fisher-exact test. For both loci and for both subgroup analyses, the Fisher-exact test detected significant associations between the presence of a mutation and the deceased indicator variable. In sum all results of the secondary analyses (Tables 1–4) support the genome-wide significant association between the mutation 25,088 bp and mortality. The locus at 12,053 bp did not achieve formally genome-wide significance in the three secondary analyses, but nonetheless remains a viable candidate locus. The large effect estimates for both mutations in all analyses (Table 2), are substantial in support of the associations. Since the criteria for selection into the study likely varies by country, and may be related to the deceased indicator, the odds ratio estimate from the Brazil sample alone may be most interpretable. Among the samples from Brazil, 19.1% of the patients whose viral genome did not carry any mutation at either loci were deceased at enrollment, 83.7% of the patients whose viral genomes carried the mutation at 25,088 bp, and 84.2% of the patients whose viral genomes carried both mutations. We did not observe any viral genomes that carried the mutation at 12,053 bp, but not 25,088 bp.

Table 1

Characteristics of 3,626 patients in the GISAID dataset for whom complete meta-information and sequenced viral genomes were available. Total number of samples (as well as males/females), numbers of deceased/non-deceased, deceased rate at enrollment, and mean age.
region	#total	#females	#males	deceased / non deceased	%deceased	mean age	Mutation frequency in % at 12,053 bp	Mutation frequency in % at 25,088 bp
entire dataset	3626	1553	2073	359 / 3267	11.0	47.4	1.7	3.6
Africa	629	420	209	1 / 628	0.2	38.5	0.0	0.0
Eastern Mediterranean	541	123	418	114 / 427	26.7	46.5	0.4	0.7
Europe	829	405	424	35 / 794	4.4	54.8	1.4	1.2
Pan American Health Organization	530	230	300	149 / 381	39.1	53.7	7.4	16.2
South-East Asia	525	171	354	59 / 466	12.7	48.2	0.4	0.8
Western Pacific	572	204	368	1 / 571	0.2	40.9	1.4	4.9

Table 2

Sample size, number of deceased samples, as well as p-values and odds ratios from the logistic regression on the two mutations: for the entire dataset, the logistic regression with WHO regions, after matching, and for samples from Brazil only.
analysis	sample size	deceased	locus	p-value	odds ratio
overall	3626	359	12,053	1.24e-12	11.7
overall	3626	359	25,088	1.24e-26	13.8
with WHO regions	3626	359	12,053	7.20e-06	6.1
with WHO regions	3626	359	25,088	1.50e-13	8.9
matching	718	359	12,053	1.18e-03	5.9
matching	718	359	25,088	1.56e-06	5.4
Brazil	201	91	12,053	8.14e-05	8.4
Brazil	201	91	25,088	6.30e-12	21.1

Table 3

P-values for the Fisher tests on the loci at 12,053 bp and 25,088 bp for São Paulo and Brazil.
region	locus	p-value
Brazil	12,053	1.12e-07
Brazil	25,088	< 2.2e-16
São Paulo	12,053	0.002465
São Paulo	25,088	1.79e-07

Given the large effect estimates for mutations in all analyses (Table 2), it is difficult to imagine un-accounted confounding mechanism that would affect mutations at just two out of almost thirty-thousand loci (12,053 bp and 25,088 bp) and that would be strong enough to cause such profound association signals, as the ones we observed in our analysis. Table 1 also provides a regional breakdown of the “deceased-at-enrollment” rates and the mutation frequencies for both loci. The rarity of the mutations outside of Brazil means that there is virtually no power to detect any association (if they exist).

Single mutations in viruses can confer enhanced virulence associated with patient mortality (Bae et al., 2018; Brault et al., 2007). In our analysis of SARS-CoV-2, the mutation at 25,088 bp occurs in the spike glycoprotein, which mediates viral attachment and cellular entry. The spike protein consists of two functional subunits: S1, which contains the receptor-binding domain, and S2, which contains the machinery needed to fuse the viral membrane to the host cellular membrane. The mutation at 25,088 is in the S2 subunit, and specifically occurs within the S2’ site, which is cleaved by host proteases to activate membrane fusion (Fig. 3a) In many viruses, membrane fusion is activated by proteolytic cleavage, an event which has been closely linked to infectivity—for instance, a multibasic cleavage site is a signature of highly pathogenic viruses including avian influenza (Walls et al., 2020). In coronaviruses, membrane fusion is known to depend on proteolytic cleavage at multiple sites, including the S1/S2 site, located at the interface between the S1 and S2 domains, and the S2’ site located within the S2 domain. These cleavage events can impact infection—in fact, a distinct furin cleavage site present in the SARS-CoV-2 S1/S2 site is not found in SARS-CoV (Vankadari, 2020), and it is thought to increase infectivity through enhanced membrane fusion activity (Walls et al., 2020; Vankadari, 2020; Xia et al., 2020). Consequently, mutations at these sites can alter virulence—for instance, a recent study reported that mutations disrupting the multibasic nature of the S1/S2 site affect SARS-CoV-2 membrane fusion and entry into human lung cells (Hoffmann et al., 2020). Several studies have also found that SARS-CoV mutants with an added furin recognition site at S2’ had increased membrane fusion activity (Belouzard et al., 2009; Watanabe et al., 2008). While enhanced infectivity does not always cause higher fatality rate, more infectious viruses can lead to a higher viral load, which can impact symptom severity and mortality (Pujadas et al., 2020). The majority of carriers for 25,088 bp (113 out of 130) exhibit a G to T missense mutation (Table 4), which changes the encoded amino acid from valine to phenylalanine. Of the other carriers, 13 had a G to A mutation corresponding to a change to isoleucine, and 4 had a G to C mutation leading to an encoded leucine. While valine, leucine, and isoleucine are branched-chain amino acids with similar biochemical and structural properties, phenylalanine has a bulkier aromatic structure. Such a substitution may impose local structural constraints, stabilize particular secondary structures (Makwana and Mahalakshmi, 2015), or introduce specific interactions which lead to preferential binding. Therefore, a mutation in the S2’ domain which promotes proteolytic cleavage could theoretically enhance viral infectivity (Fig. 3b) and consequently, patient mortality. While many current therapies primarily target the receptor binding domain within the S1 subunit of the SARS-CoV-2 spike protein, our findings suggest that the S2 domain may be an important additional target for therapeutic development.

The mutation at 12,053 bp occurs within the ORF1ab gene, which expresses a polyprotein comprised of 16 nonstructural proteins (Yoshimoto, 2020). Specifically, 12,053 bp occurs in NSP7, which dimerizes with NSP8 to form a heterodimer that complexes with NSP12, ultimately forming the RNA polymerase complex essential for genome replication and transcription. Mutations causing enhanced viral polymerase activity have been linked to increased pathogenicity of influenza viruses. The majority of carriers for 12,053 bp (52 out of 61) exhibit a C to T missense mutation, which causes leucine to be substituted for phenylalanine (Table 4). Such a mutation may confer structural rigidity which could potentially alter interactions with other components of replication and transcription machinery, but experimental analysis is needed to test these hypotheses. Collectively, these results suggest that genetic variation in the viral genome sequence may contribute to the increased COVID-19 mortality.

Table 4: Genomic variants at each loci, affected protein position, and corresponding amino acid change.

Locus	A	C	G	T	Protein	Position	Primary substitution
12053	4	3514	5	52	nsp7	70	Leu --> Phe
25088	13	4	3445	113	Spike	1176	Val --> Phe

2.1 Data acquisition

The analysis presented in this article is based on nucleotide sequences with accession numbers EPI_ISL_402124 to EPI_ISL_541337, downloaded from the GISAID database (Elbe and Buckland-Merrett, 2017; Shu and McCauley, 2017) as a file in "fasta" format on 22 September 2020. Only patients with additional metadata (age, sex, and hospitalization status as plain text comments) were selected on GISAID, resulting in 7,151 samples.

2.2 Data cleaning

We filtered the 7,151 samples for complete nucleotide sequences, and aligned them using the anchor sequence given in Hahn et al. (2020a). After this step, 4,385 samples remained.

Using the location tag in the fasta file, we grouped all samples according to the WHO regional offices for Africa (AFRO), for the Eastern Mediterranean (EMRO), for Europe (EURO), for South-East Asia (SEARO), for the Western Pacific (WPRO), as well as the Pan American Health Organization (PAHO). In particular, the countries included in each group are as follows: (1) AFRO (Algeria, South Africa, Gambia, Nigeria, Senegal, as well as Congo); (2) EMRO (Egypt, Morocco, Kuwait, Lebanon, Oman, Saudi Arabia, United Arab Emirates, as well as Iran); (3) EURO (Austria, Belgium, Bosnia and Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Faroe Islands, France, Germany, Hungary, Italy, Poland, Portugal, Romania, Russia, Slovakia, Spain, Sweden, as well as Israel, Turkey, Kazakhstan, Andorra, and Georgia); (4) PAHO (Canada, USA, Costa Rica, Mexico, Argentina, Brazil, Chile, Colombia, Ecuador, Peru, Venezuela, as well as Puerto Rico and Uruguay); (5) SEARO (Bangladesh, India, Indonesia, Myanmar, Nepal, Sri Lanka, Thailand); (6) WPRO (Cambodia, Japan, Malaysia, Vietnam, Australia, Guam, Hong Kong, China, Singapore, as well as South Korea and Taiwan). Filtering for only those samples falling into the aforementioned geographical categories resulted in 4,378 samples.

Finally, we matched these 4,378 samples to the metadata information (age, sex, clinical outcome) available on GISAID. Filtering for those samples having complete metadata information resulted in n=3,626 samples.

2.3 Data analysis

After aligning all sequences, we established a window of length 28,492bp in which all viral sequences have reads. We compared all trimmed and aligned sequences entry-wise to the SARS-CoV-2 reference sequence published on GISAID (accession number EPI_ISL_412026), and denoted in a matrix X with an entry X_ij=1 that sequence i deviated from the reference sequence at position j. All other entries of X are zero.

We used the R-package "locStra" (Hahn et al., 2020c,d) to calculate the Jaccard similarity matrix (Jaccard, 1901; Tan et al., 2005; Prokopenko et al., 2016; Schlauch et al., 2017) for the n viral genomes based on the matrix X. The Jaccard matrix J(X) has n rows and n columns, and each entry (i,j) is the Jaccard similarity index between the ith and jth SARS-CoV-2 genome in our dataset. Computing the first 5 eigenvectors of the Jaccard similarity matrix J(X) allows us to visualize the geographic clustering of the viral genomes, as well as to guard the logistic regression analysis against such confounding by including the first eigenvectors in the regression analysis as covariates.

For the association analysis of the entire viral genome, we defined the response to be a binary indicator for the clinical outcome, where we only distinguish between all those patients/hosts whose hospitalization status tag at enrollment into the GISAID database was listed as “deceased” (outcome of 1) versus the remaining samples as non-deceased (outcome of 0). At this point, no other information regarding clinical outcome was available in GISAID.

We then performed p=28,492 logistic regressions of the binary outcome variable on the following covariates: the column vector X_·i encoding the mismatches of each sample at the i'th location on the SARS-CoV-2 nucleotide sequence, patient’s age, sex and the first 5 eigenvectors of the Jaccard matrix. The logistic regression was carried out in R using the default “glm” command, where the parameter "family” was set to “family=binomial(link="logit")”. We tested the i'th locus/location of the viral genome for association with mortality by testing whether the regression coefficient for column X_·i is equal to zero. Finally, we assessed the significance of each tested locus by comparing the individual p-values to a Bonferroni corrected threshold of 0.05/p=1.75e-06.

We observed in Figure 1 that the viral genomes that carry least one mutation at 12,053bp or 25,088bp are located in distinct branches that correspond to geographic regions. Therefore, we performed two additional logistic regression analyses to take the observed clustering into account: one in which we also included “region” as a categorical covariate in the regression model, and one in which we matched the viral genomes based on their positions in the eigenvector plot (Figure 1). For the matched analysis, we identified the 359 viral genomes whose patient indicator variable is set as “deceased” (outcome of 1). For each of the 359 deceased samples, we identified the “non-deceased” viral genome that is closest in terms of Euclidian distance in the eigenvector plot (Figure 1), where each deceased sample is matched to a different “non-deceased” sample.

Finally, we also report results from the Fisher exact test that is applied to contingency tables. To this end, for a certain subgroup of the population (e.g., location tag “Brazil”), we determined the number of deceased and non-deceased samples which are carriers or non-carriers of the mutations at 12,053bp or 25,088bp, respectively. The resulting 2x2 contingency table was tested in R with the default “fisher.test” command, and the p-value of the Fisher test was reported.

Acknowledgements

The authors gratefully acknowledge the contributors, originating and submitting laboratories of the sequences from GISAID's EpiCoV^TM Database (Elbe and Buckland-Merrett, 2017; Shu and McCauley, 2017) on which this research is based. A detailed list of contributors is available in the Supplementary Information.

Data Availability Statement

Sequence data that support the findings of this study are deposited in the GISAID database with accession numbers in the range of EPI_ISL_402124 to EPI_ISL_541337 (https://www.gisaid.org/).

Agostini M.L., Andres E.L., Sims A.C., Graham R.L., Sheahan T.P., Lu X., Smith E.C., Case J.B., Feng J.Y., Jordan R., Ray A.S., Cihlar T., Siegel D., Mackman R.L., Clarke M.O., Baric R.S., Denison M.R. (2020). Coronavirus Susceptibility to the Antiviral Remdesivir (GS-5734) Is Mediated by the Viral Polymerase and the Proofreading Exoribonuclease. doi:10.1128/mBio.00221-18
Bae J.-Y., Lee I., Kim J.I., Park S., Yoo K., Park M., Kim G., Park M.S., Lee J.-Y., Kang C., Kim K., and Park M.-S. (2018). A Single Amino Acid in the Polymerase Acidic Protein Determines the Pathogenicity of Influenza B Viruses. J Virol, 92(13):e00259-18.
Becerra-Flores M. and Cardozo T. (2020). SARS‐CoV‐2 viral spike G614 mutation exhibits higher case fatality rate. Int J Clin Pract, 74(8):1–4.
Belouzard S., Chu V.C., and Whittaker G.R. (2009). Activation of the SARS coronavirus spike protein via sequential proteolytic cleavage at two distinct sites. Proc Natl Acad Sci USA, 106(14):5871–5876.
Brault A.C., Huang C., Langevin S.A., Kinney R.M., Bowen R.A., Ramey W.N., Panella N.A., Holmes E.C., Powers A.M., and Miller B.R. (2007). A single positively selected West Nile viral mutation confers increased virogenesis in American crows. Nature Genetics, 39:1162–1166.
Eaaswarkhanth M., Al Madhoun A., Al-Mulla F. (2020). Could the D614G substitution in the SARS-CoV-2 spike (S) protein be associated with higher COVID-19 mortality? Int J Infect Dis, 96:459–460.
Elbe S. and Buckland-Merrett G. (2017). Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Challenges, 1:33–46.
Forster P., Forster L., Renfrew C., and Forster M. (2020). Phylogenetic network analysis of SARS-CoV-2 genomes. PNAS, 117(17):9241–9243.
Gao Y., Yan L., Huang Y., Liu F., Zhao Y., Cao L., Wang T., Sun Q., Ming Z., Zhang L., Ge J., Zheng L., Zhang Y., Wang H., Zhu Y., Zhu C., Hu T., Hua T., Zhang B., Yang X., Li J., Yang H., Liu Z., Xu W., Guddat L.W., Wang Q., Lou Z., and Rao Z. (2020). Structure of the RNA-dependent RNA polymerase from COVID-19 virus. Science, 368(6492):779–782.
Geoghegan J.L. and Holmes E.C. (2018). The phylogenomics of evolving virus virulence. Nature Reviews Genetics, 19:756–769.
Hahn G., Lee S., Weiss S.T., and Lange C. (2020). Unsupervised cluster analysis of SARS-CoV-2 genomes indicates that recent (June 2020) cases in Beijing are from a genetic subgroup that consists of mostly European and South(east) Asian samples, of which the latter are the most recent. doi:10.1101/2020.06.22.165936.
Hahn G., Lee S., Weiss S.T., and Lange C. (2020). Unsupervised cluster analysis of SARS-CoV-2 genomes reflects its geographic progression and identifies distinct genetic subgroups of SARS-CoV-2 virus. doi:10.1101/2020.05.05.079061.
Hahn G., Lutz S.M., Hecker J., Prokopenko D., Cho M.H., Silverman E., Weiss S.T., and Lange C. (2020). locstra: Fast analysis of regional/global stratification in whole genome sequencing (wgs) studies. Genetic Epidemiology (to appear). doi:10.1002/gepi.22356.
Hahn G., Lutz S.M., and Lange C. (2020). locStra: Fast Implementation of (Local) Population Stratification Methods (v1.3). https://cran.r-project.org/package=locStra.
Hoffmann M., Kleine-Weber H., and Pöhlmann S. (2020). A Multibasic Cleavage Site in the Spike Protein of SARS-CoV-2 Is Essential for Infection of Human Lung Cells. Mol Cell, 78(4):779–784.e5.
Jaccard P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaud Des Sci Nat, 37:547–579.
Kirchdoerfer R.N. and Ward A.B. (2019). Structure of the SARS-CoV nsp12 polymerase bound to nsp7 and nsp8 co-factors. Nat Commun, 10(1):2342.
Lei J., Kusov Y., and Hilgenfeld R. (2018). Nsp3 of coronaviruses: Structures and functions of a large multi-domain protein. Antiviral Res, 149:58–74.
Long S.W., Olsen R.J., Christensen P.A., Bernard D.W., Davis J.J., Shukla M., Nguyen M., Saavedra M.O., Yerramilli P., Pruitt L., Subedi S., Kuo H.-C., Hendrickson H., Eskandari G., Nguyen H.A.T., Long J.H., Kumaraswami M., Goike J., Boutz D., Gollihar J., McLellan J.S., Chou C.-W., Javanmardi K., Finkelstein I.J., and Musser J. (2020). Molecular Architecture of Early Dissemination and Massive Second Wave of the SARS-CoV-2 Virus in a Major Metropolitan Area. doi:10.1101/2020.09.22.20199125
Makwana K.M. and Mahalakshmi R. (2015). Implications of aromatic-aromatic interactions: From protein structures to peptide models. Protein Sci, 24(12):1920–33.
Manolio T.A. (2010). Genomewide Association Studies and Assessment of the Risk of Disease. N Engl J Med, 363:166–176.
Nogales A., Martinez-Sobrido L., Topham D.J., and DeDiego M.L. (2017). NS1 Protein Amino Acid Changes D189N and V194I Affect Interferon Responses, Thermosensitivity, and Virulence of Circulating H3N2 Human Influenza A Viruses. J Virol, 91(5):e01930-16.
Pachetti M., Marini B., Benedetti F., Giudici F., Mauro E., Storici P., Masciovecchio C., Angeletti S., Ciccozzi M., Gallo R.C., Zella D., and Ippodrino R. (2020). Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant. J Transl Med 18, 179.
Price M.N., Dehal P.S., and Arkin A.P. (2010). FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments. PLOS One, 5(3):e9490.
Prokopenko D., Hecker J., Silverman E., Pagano M., Nöthen M., Dina C., Lange C., and Fier H. (2016). Utilizing the Jaccard index to reveal population stratification in sequencing data: a simulation study and an application to the 1000 Genomes Project. Bioinformatics, 32(9):1366–1372.
Pujadas E., Chaudhry F., McBride R., Richter F., Zhao S., Wajnberg A., Nadkarni G., Glicksberg B.S., Houldsworth J., and Cordon-Cardo C. (2020). SARS-CoV-2 viral load predicts COVID-19 mortality. Lancet Respir Med, 8(9):e70.
Roeder K., Bacanu S.-A., Wasserman L., and Devlin B. (2006). Using Linkage Genome Scans to Improve Power of Association in Genome Scans. Am J Hum Genet, 78:243–252.
Schlauch D., Fier H., and Lange C. (2017). Identification of genetic outliers due to sub-structure and cryptic relationships. Bioinformatics, 33(13):1972–1979.
Shu Y. and McCauley J. (2017). GISAID: Global initiative on sharing all influenza data -- from vision to reality. EuroSurveillance, 22(13):30494.
Tan P.-N., Steinbach M., and Kumar V. (2005). Introduction to Data Mining. Pearson; 1st Edition.
Toyoshima Y., Nemoto K., Matsumoto S., Nakamura Y., Kiyotani K. (2020). SARS-CoV-2 genomic variations associated with mortality rate of COVID-19. J Hum Genet. doi:10.1038/s10038-020-0808-9
Vankadari N. (2020). Structure of Furin Protease Binding to SARS-CoV-2 Spike Glycoprotein and Implications for Potential Targets and Virulence. J Phys Chem Lett, 11(16):6655–6663.
Walls A.C., Park Y.-J., Tortorici M.A., Wall A., McGuire A.T., and Veesler D. (2020). Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell, 181(2):281–292.e6.
Watanabe R., Matsuyama S., Shirato K., Maejima M., Fukushi S., Morikawa S., and Taguchi F. (2008). Entry from cell surface of SARS coronavirus with cleaved S protein as revealed by pseudotype virus bearing cleaved S protein. J Virol, 82(23):11985–11991.
Williamson E.J., Walker A.J., Bhaskaran K., Bacon S., Bates C., Morton C.E., Curtis H.J., Mehrkar A., Evans D., Inglesby P., Cockburn J., McDonald H.I., MacKenna B., Tomlinson L., Douglas I.J., Rentsch C.T., Mathur R., Wong A.Y.S., Grieve R., Harrison D., Forbes H., Schultze A., Croker R., Parry J., Hester F., Harper S., Perera R., Evans S.J.W., Smeeth L., and Goldacre B. (2020). Factors associated with COVID-19-related death using OpenSAFELY. Nature, 584(7821):430–436.
Xia S., Liu M., Wang C., Xu W., Lan Q., Feng S., Qi F., Bao L., Du L., Liu S., Qin C., Sun F., Shi Z., Zhu Y., Jiang S., and Lu L. (2020). Inhibition of SARS-CoV-2 (previously 2019-nCoV) infection by a highly potent pan-coronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion. Cell Res, 30(4):343–355.
Yin W., Mao C., Luan X., Shen D.-D., Shen Q., Su H., Wang X., Zhou F., Zhao W., Gao M., Chang S., Xie Y.-C., Tian G., Jiang H.-W., Tao S.-C., Shen J., Jiang Y., Jiang H., Xu Y., Zhang S., Zhang Y., Xu H.E. (2020). Structural basis for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir. Science, 368(6498):1499–1504.
Yoshimoto F.K. (2020). The Proteins of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS CoV-2 or n-COV19), the Cause of COVID-19. Protein J, 39(3):198–216.

Supplementary Information was not provided with this version of the manuscript.

There is NO Competing Interest.

Download PDF

Version 1

posted

You are reading this latest preprint version

Mutations in SARS-CoV-2 spike protein and RNA polymerase complex are associated with COVID-19 mortality risk

Status:

Version 1

Abstract

Figures

Background

Whole-genome Association Analysis Of The Sars-cov-2 Genomes

Methods

2.1 Data acquisition

Declarations

References

Supplementary Information

Additional Declarations

Status:

Version 1