Table 1: The GenBank data for the amino acid sequences of 12 S glycoprotein of SARS-CoV-2 isolates from China, Iran, and Tunisia.
|
GenBank Accession
|
Surface Glycoprotein (S) GeneCDS Definition
|
Base Pairs1
|
Protein ID
|
Amino Acids
|
Collection Date5
|
Isolation Source
|
Locality
|
1
|
MN938387.1
|
2019-nCoV_HKU-SZ-001_20202
|
107
|
QHN73805.1
|
35
|
2020-01
|
nasopharyngeal swab
|
China: Shenzhen 1
|
2
|
MN938388.1
|
2019-nCoV_HKU-SZ-002b_20202
|
107
|
QHN73806.1
|
35
|
2020-01
|
Serum
|
China: Shenzhen 2
|
3
|
MN938389.1
|
2019-nCoV_HKU-SZ-004_20202
|
107
|
QHN73807.1
|
35
|
2020-01
|
nasopharyngeal swab
|
China: Shenzhen 3
|
4
|
MN938390.1
|
2019-nCoV_HKU-SZ-005_20202
|
107
|
QHN73808.1
|
35
|
2020-01
|
Throat swab
|
China: Shenzhen 4
|
5
|
MN975266.1
|
2019-nCoV_HKU-SZ-007a_20202
|
107
|
QHN73822.1
|
35
|
2020-01
|
nasopharyngeal swab
|
China: Shenzhen 5
|
6
|
MN975267.1
|
2019-nCoV_HKU-SZ-007b_20202
|
107
|
QHN73823.1
|
35
|
2020-01
|
Throat swab
|
China: Shenzhen 6
|
7
|
MN975268.1
|
2019-nCoV_HKU-SZ-007c_20202
|
107
|
QHN73824.1
|
35
|
2020-01
|
Sputum
|
China: Shenzhen 7
|
8
|
MT232871.1
|
SARS-CoV-2/human/IRN/MHKN-1/20203
|
157
|
QIQ08768.1
|
52
|
2020-02-26
|
nasopharyngeal swab
|
Iran: Tehran 1
|
9
|
MT232872.1
|
SARS-CoV-2/human/IRN/MHKN-2/20203
|
158
|
QIQ08769.1
|
52
|
2020-02-26
|
nasopharyngeal swab
|
Iran: Tehran 2
|
10
|
MT308701.1
|
SARS-CoV-2/human/TUN/Tunis7266/2020
|
493
|
QIV64962.1
|
163
|
2020-04-02
|
nasopharyngeal swab
|
Tunisia: Tunis 1
|
11
|
MT324679.1
|
SARS-CoV 2/human/TUN/Tunis_6401/20204
|
491
|
QIZ14987.1
|
163
|
2020-03-29
|
nasopharyngeal swab
|
Tunisia: Tunis 2
|
12
|
MT324680.1
|
SARS-CoV-2/human/TUN/Tunis_7643/20204
|
491
|
QIZ14988.1
|
163
|
2020-04-03
|
nasopharyngeal swab
|
Tunisia: Tunis 3
|
Abbreviations: CDS, coding sequence; HKU, Hong Kong University; ID, identity; IRN, Iran; kDa, kilodalton; MHKN, Mohammad Hadi Karbalaie Niya; MW, molecular weight; nCoV, novel coronavirus; SARS, severe acute respiratory syndrome; SZ, Shenzhen; TUN, Tunisia.
1 All the original sequencing uses Sanger Dideoxy Sequencing Technique.
2 Identical proteins from (Wuhan seafood market pneumonia virus) from China.
3 Identical proteins from (Severe acute respiratory syndrome coronavirus 2) from Iran.
4 Identical proteins from (Severe acute respiratory syndrome coronavirus 2) from Tunisia.
5 The order of arrangement in the collection date is the order of appearance in the GenBank.
Table 2: Percent Identity Matrix created by Clustal 2.1. describes the percentage of similarity between the 12 amino acid sequences.
|
|
China 1
|
China 2
|
China 3
|
China 4
|
China 5
|
China 6
|
China 7
|
Iran 1
|
Iran 2
|
Tunisia 1
|
Tunisia 2
|
Tunisia 3
|
Protein ID
|
QHN73805.1
|
QHN73806.1
|
QHN73807.1
|
QHN73808.1
|
QHN73822.1
|
QHN73823.1
|
QHN73824.1
|
QIQ08768.1
|
QIQ08769.1
|
QIV64962.1
|
QIZ14987.1
|
QIZ14988.1
|
1
|
QHN73805.1
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
25
|
25
|
25
|
2
|
QHN73806.1
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
25
|
25
|
25
|
3
|
QHN73807.1
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
25
|
25
|
25
|
4
|
QHN73808.1
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
25
|
25
|
25
|
5
|
QHN73822.1
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
25
|
25
|
25
|
6
|
QHN73823.1
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
25
|
25
|
25
|
7
|
QHN73824.1
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
25
|
25
|
25
|
8
|
QIQ08768.1
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
29.27
|
29.27
|
29.27
|
9
|
QIQ08769.1
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
100
|
29.27
|
29.27
|
29.27
|
10
|
QIV64962.1
|
25
|
25
|
25
|
25
|
25
|
25
|
25
|
29.27
|
29.27
|
100
|
99.39
|
99.39
|
11
|
QIZ14987.1
|
25
|
25
|
25
|
25
|
25
|
25
|
25
|
29.27
|
29.27
|
99.39
|
100
|
100
|
12
|
QIZ14988.1
|
25
|
25
|
25
|
25
|
25
|
25
|
25
|
29.27
|
29.27
|
99.39
|
100
|
100
|
Table 3: The comparison of the biochemical profiles of the amino acid sequences of 12 S glycoproteins of the SARS-CoV-2 by the use of the ProtParam tool of ExPASy.
Parameters
|
China 1-71
|
Iran 1-21
|
Tunisia 11
|
Tunisia 2-31
|
Molecular weight (kDa)
|
3.93
|
5.76
|
17.67
|
17.672
|
Number of negatively charged amino acids (Asp + Glu)
|
6
|
7
|
13
|
12
|
Number of positively charged amino acids (Arg + Lys)
|
4
|
5
|
15
|
15
|
Chemical formula
|
C174H267N47O57
|
C257H396N66O80S2
|
C777H1252N222O238S5
|
NA
|
Total number of atoms
|
545
|
801
|
2494
|
NA
|
Tryptophan residue availability
|
No
|
No
|
No
|
No
|
Extinction coefficients (M-1 cm-1)
|
4470
|
45953
|
45953
|
45953
|
Absorbance Abs 0.1% (=1 g/l)
|
1.138
|
0.799
|
0.26
|
0.26
|
N-terminal4
|
N (Asn)
|
P (Pro)
|
Q (Gln)
|
Q (Gln)
|
The estimated half-life
|
Mammalian reticulocytes, in vitro
|
1.4 h
|
>20 h
|
0.8 h
|
0.8 h
|
Yeast, in vivo
|
3 min
|
>20 h
|
10 min
|
10 min
|
Escherichia coli, in vivo
|
>10 h
|
NA
|
10 h
|
10 h
|
The instability index (II)
|
17.20 (Stable)
|
5.79 (Stable)
|
49.67 (Unstable)
|
47.19 (Unstable)
|
Aliphatic index
|
78.00
|
82.50
|
94.54
|
94.54
|
GRAVY score
|
-0.671
|
-0.188
|
-0.050
|
-0.029
|
Abbreviations: h, hours; GRAVY, Grand average of hydropathicity; kDa, kilodaltons; min, minutes; NA, not available.
1 China 1-7 includes identical sequences (QHN73805.1, QHN73806.1, QHN73807.1, QHN73808.1, QHN73822.1, QHN73823.1, QHN73824.1). Iran 1-2 includes identical sequences (QIQ08768.1 and QIQ08769.1). Tunisia 1 includes the sequence (QIV64962.1), which shares no identity. Tunisia 2-3 includes identical sequences (QIZ14987.1 and QIZ14988.1).
2 The molecular weight is calculated for 163 amino acids using ExPASy is 17.67 kDa. Protein Molecular Weight - Bioinformatics.org tool calculates the molecular weight for 162 amino acids only to be (17.56 kDa), excluding the unknown amino acid.
3 Assuming all pairs of Cys residues form cystines. While if assuming all Cys residues are reduced, the extinction coefficient will be 4470 M-1 cm-1, and the absorbance will be 0.777 for the Iranian models (1-2), and 0.253 for all the Tunisian models.
4 The N-terminal of the Chinese models is polar with a positively charged side group. The Iranian models had nonpolar N-terminal with an uncharged side group. The N-terminal residues of the Tunisian models are polar with an uncharged side group.