SARS-COV-2 sequences and data collection
A total 209 SARS-COV-2 genome sequences from humans, cats, dogs, tigers, lions, hamster and minks were used for the genetic analysis (The strains information was recorded in the Supplementary Table S1). All the genomic sequences selected by the hosts were obtained from the GIAISD database (https://www.gisaid.org/). Isolate Wuhan/WIV04 was used as the reference strain.
Evolutionary analysis
209 SARS-COV-2 genomes were used for phylogenetic analysis. The evolutionary history was inferred by using the Maximum Likelihood method and Tamura-Nei model [19]. The tree with the highest log likelihood (-42442.50) is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. Initial tree(s) for the heuristic search were obtained by applying the Neighbor-Joining method to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.0500)). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. There was a total of 29903 positions in the final dataset. Evolutionary analyses were conducted in MEGA-X [20].
Identification of mutations
The sequences were aligned using MEGA-X, and the single nucleotide polymorphisms were analyzed using the SNiPlay pipeline by uploading aligned Fasta format file (https://sniplay.southgreen.fr/cgi-bin/analysis_v3.cgi)[21]. All the sequences including coding regions, 5’UTR and 3’UTR were used for the analysis.
Estimation of nonsynonymous and synonymous substitution rates
The number of nonsynonymous substitutions per synonymous site (dN) and the number of synonymous substitutions per nonsynonymous site (dS) for each coding site were calculated using the Nei–Gojobori method (Jukes–Cantor) in MEGA-X. The Datamonkey adaptive evolution server (http://www.datamonkey.org) was used to identify sites where only some of the branches have undergone selective pressure. The mixed-effects model of evolution (MEME) and fixed effects likelihood (FEL) approaches were used to infer the nonsynonymous and synonymous substitution rates.
Codon usage analysis
The codon adaptation index (CAI) of a given coding sequence was calculated using R script [22]. A CAI analysis of those coding sequences from different hosts was performed using DAMBE 5.0 and the CAI [23, 24]. The codon usage data of different hosts were retrieved from the codon usage database (http://www.kazusa.or.jp/codon/), and the relative synonymous codon usages (RSCUs) were analyzed using MEGA-X software. Animal-SARS2 means SARS-CoV-2 isolated from the indicted animals, the access IDs from mink, cat, dog, tiger and lion are MT457401.1, MT747438.1, MT215193.1, MT704316.1 and MT704312.1. Bat-CoV refers to RaTG13 (GenBank: MN996532.2) and the corresponding host (bat), Pangolin-COV refers to pangolin coronavirus (GenBank: QLR06867.1).
Neutral evolution analysis
Neutrality plot analysis was constructed to determine the codon bias influenced by natural selection. It reflected the neutrality effect of directional mutation pressure equilibrium in shaping the codon usage bias [25]. Mostly the GC3 position has an equal probability of appearing A/T and G/C nucleotides. There will be variation between GC12 against GC3 regression values due to the directional mutational pressure. Here GC12 against GC3 were plotted with a regression line.
Spike protein sequence and structure reconstruction
The crystal structure of the SARS-CoV-2 receptor-binding domain (RBD) in complex with human ACE2 (PBD ID: 6M0J) was used for structural analysis. Structures of ACE2 and the viral spike from mink were constructed by the SWISS model server (https://swissmodel.expasy.org/). Comparisons of the predicted protein structures and pairwise comparisons were analyzed using PyMOL software.
Molecular dynamics
For the binding free energy (𝐸), we simulated the minimized annealing energy through molecular dynamics (MD) simulation in YASARA [26]. We performed three iterations of energy minimization for the set of wild-type residues in the viral spike protein bound with human ACE2 and mutant residues in the Mink-SARS2 spike with mink ACE2. The relative binding energy (∆𝐸) are reported as the mean and standard deviation values across three replicates.
Selective coefficient index
The selection coefficient index (S) of all SARS-CoV-2 codons was estimated by the FMutSel0 model in the program CODEML (PAML package) [27]. The fitness parameter of the most common residues at each location is fixed to 0, while the other fitness parameters are limited to −20 < F < 20.
Plasmids construction and cell culture
The gene fragment of RBD of SARS-COV-2 (NCBI ID: MN996528.1) inserted into pCAGGS plasmid was donated by Prof. Jianguo Wu, RBD fragments of SARS-COV-2 and Mink-ACE2 (NCBI ID: MW269526.1) were synthesized by Genscript Inc. The mutants of RBD genes were constructed using the Mut Express II Fast Mutagenesis Kit (Vazyme, C214). The wild type and mutant genes were cloned into pCAGGS vector with a 6 ´ Histidine tags using the EcoRI and XhoI restriction sites. Mink-ACE2 fragment was inserted into vector pcDNA3.1-eGFP (named pcDNA3.1-mACE2-eGFP). The products were subsequently transformed into competent E.coli Top10 strain. Recombinant plasmids were verified by Next-generation sequencing. Plasmid pcDNA3.1-hACE2-eGFP expressing hACE2 fused with eGFP was ordered from Fubio Ltd (MC_0101086). HEK293T and BHK-21 cells were cultured by high-sugar DMEM medium in 5% CO2 atmosphere of 37℃ incubator.
Protein expression and purification
The recombinant RBDs of SARS-COV-2 mutants were expressed in CHO cells. Recombinant pCAGGS plasmids were transfected into CHO cells in 245 mm dishes according to the manufacturer’s recommendations; the supernatant was collected and centrifuged after 5 days. The soluble proteins were purified by using HisPur™ Ni-NTA Resin (Thermo Scientific, 88221). Purified proteins were eluted in buffer consisting of 20 mM Tris-HCl (pH 8.0) and 150 mM NaCl.
Flow cytometry
The plasmids pcDNA3.1-hACE2-GFP and pcDNA3.1-mACE2-eGFP were transfected into BHK-21 cells using Liposome 6000 (Beyotime, C0526) according to the manufacturer's instructions. The cells expressing hACE2-GFP and mACE2-GFP were collected after transfection for 24 h and suspended in the PBS buffer. Then, the cells were incubated with the purified His-tagged RBDs at a final concentration of 30 μg/mL at 37ºC for 30 min. After washed by PBS twice and incubation with anti-His/APC antibodies (1:5000), the cells were determined using a BeckMan CytoFLEX and the data were analyzed using FlowJo V10 software.
Statistical analysis and mapping
Statistical analyses were performed using ANOVA followed by Turkey's post hoc test (Fig 2C&2F) or Student’s t-test (Fig 3E), and the data were considered significantly different if the p-value was less than 0.05. The SPSS 20.0 software was used for regression curve fit. ***p<0.001, **p<0.01, *p<0.05, ns. means no significant. The figures were mapped by the software PRISM GraphPad 5.0.