COVID–19 being closed to SARS
We use genome nuclear acids data to reconstructed the phylogenetic relationship between COVID–19 and other 7 coronavirus. The sequences are alignmented by program MAFFT, strategy G–INS–1, scoring matrix for amino acid sequences is BLOSUM. The optimal tree under the popular maximum likelihood (ML) criterion is found by RAXML in this work. The phylogenetic postion of spike protein of COVID–19 is analysed with the same strategy and based on amino acid sequence. Their tree topology lead to the prediction that either the COVID–19’s nuclear acids sequence or the spike protein amino acid sequence is much closer to SARS–Cov’s than to any other species included in this analysis(BP = 100, BP = 100)(Fig. 1a,b).
Modeling COVID–19 spike protein
We then do homologous modeling, for simulating the 3D structure of spike protein of COVID–19, using SWISSMODEL. The similarity between the modeling protein sequence and the template protein sequence is 76.47% and the template sequence is from SARS–Cov (spike protein). The predicted structure passed the test of PROVE, but failed the test of VERIFY, ERRAT and PROCHECK (GMQE = 0.73, QMEAN = − 3.63)(Fig. 2a,b). The other predicted models is either GMQE or QMEAN unqualified.
Due to the modeling result can not pass three independent tests, we can only use the spike protein structure of SARS–Cov to carry out the protein molecular docking experiment.
Type lecin interacting with spike protein inhabiting the formation of ACE2–spike complex.
We select the spike protein (PD id = 5wrg) of SARS–Cov with known crystalline structure ( sequence similarity up to 75.4% and ratio coverage up to 99% ) for protein molecular docking experiment. The published protein structures of spike protein (PD id = 5wrg) and C–type lectin (macrophage C–type lectin, CELC4D PD id = 3whd) are used for this experiment. The online software Z–dock are used to simulate the docking and filtering the predictions with its built–in scoring matrix. The prediction is shown in Fig. 3a,b.
ACE2 mediates the entry of SARS–Cov to the host cells by binding virus’s spike protein. The binding site is within the RBD (receptor binding domain, N318–V510) [19], as shown in Fig. 3c,d (PD id = 6cs2). The molecular docking result shows that the binding site of C–type lectin also within RBD of spike protein and the docking of C–type lectin shows spatially obstruction for the ACE2–Spike complex formation (Fig. 3c, d, e, f).
Changing of expression profile of C–type lecin family indicating potential therapeutic targets
To further test whether C–type lectin family participate in the resistance of virus, we mining transcriptome data of mouse response to virus infection [20]. We find that the expression profile of C–type lectin family being significantly changed during first seven days after infected by SARS–Cov in mouse (Fig. 4a). Nfkb2, Tnf, Nfkbie, Clec4a3, Clec4e, Clec1 4a, Clec1 2b,Clec4d’s expression rates make the peak in the same day the weight of mouse meeting their minimum level [20]. The expression rates of Clec12a,Clec7a and Clec11a rise with the mouse recovering from SARS, indicating their potential roles against virus. While the Clec4a3, Clec4e, Clec1 4a, Clec1 2b and Clec4d have similar expression trends with Nfkb2, Tnf and Nfkbie, indicting that they maybe participate in the C–type lectin–dependant immunological mechanism in the first two days and the real roles of C–type lectin family members shall be further functionally tested.
We also wondering, which type of the immune cell cooperate with C–type lectin during infection. CD (cluster of differentiation) is a class of cell surface molecules that are expressed in various types of immune cells [21]. We often use these molecules as cell markers to identify different types of immune cells. We then expand our datasets for clustering analysis adding all CD markers identified in McDermott’s transcriptome data to predicting the cell types participate in the C–type lectin–dependant manner.
The expression rate of Cd59b and Cd209f are positively correlated with Clec12a, Clec7a and Clec11a, indicting the cell type they represent maybe carry out the roles against virus mediated by Clec12a, Clec7a and Clec11a. All Cd28, Cd3d, Cd6, Cd247, Cd27, Cd3g, Cd8a, Cd48, Cd226, Cd8b1, Cd3e, Cd2, Cd19, Cd5, Cd4, Cd160, Cd79b and Cd209a show negative correlation with inflammatory reaction and their expression rates meet their peak when most of mouse recovered from SARS indicating the cell type they represent having potential function against virus. Cd80, Cd300lf, Cd209b, Cd244, Cd300e and Cd177’s expression rates decreasing the whole time may be caused by the cell types they representing are susceptible to virus (Fig. 4b).
b.The expression rate of Cd59b and Cd209f are positively correlated with Clec12a, Clec7a and Clec11a, indicting the cell type they represent maybe carry out the roles against virus mediated by Clec12a, Clec7a and Clec11a. All Cd28, Cd3d, Cd6, Cd247, Cd27, Cd3g, Cd8a, Cd48, Cd226, Cd8b1, Cd3e, Cd2, Cd19, Cd5, Cd4, Cd160, Cd79b and Cd209a show negative correlation with inflammatory reaction and their expression rates meet their peak when most of mouse recovered from SARS indicating the cell type they represent having potential function against virus. Cd80, Cd300lf, Cd209b, Cd244, Cd300e and Cd177’s expression rates decreasing maybe caused by the cell types they representing are susceptible to virus. Data is normalized with Z–score for clustering analysis.
Inferring C–type lectin–dependent CD4/CD28 T cell survival network
We infer the C–type lectin–dependent CD4/CD28 T survival cell network (detailed in reference [22]), presuming the roles of Clec7a, Clec12a and Clec11a, which positively correlating with CD 4 and CD 28 while negative correlating with TNF and Nf–kippa B (Fig. 5) as the inhibitor of apoptosis, comparing the network dynamic landscape with experiment data to verify the predicted function of these C–type lectin family members.
Using EMT to provide a general framework to quantify the network and transformed it into a nonlinear dynamic system, there are three states underlying the T cell survival endogenous molecular network (TEMT) being found–state A, B and C; B is a saddle point and the network dynamics constructed by introducing stochastic fluctuation. ) [23]. The presumed roles of Clec7a, Clec12a and Clec11a been verified while the predicted expression trends of EGF, IKK, AKT, ASK, and cFLIP (A–B–C) all in good agreement with experiment data (day 2–day 4–d 7) (Fig. 6a,b).
Above all, the COVID–19’s nuclear acids sequence or the spike protein amino acid sequence has much closer relationship with SARS–Cov than with any other species included in this analysis. Using the spike protein of SARS–Cov, to do molecular docking, finds out C–type lectin may inhibit the interaction between ACE2 and spike protein. The expression profile of C–type lectin family changes significantly during infection and the correlation between C–type lectin, Tnf, NF–kippa B and some CD markers meets the logic of C–type lectin activate immunological mechanism to against virus–the activation of NF–kippa B and TNF are important to the host immune response during infection [20]; the activation of NF–kippa B signaling can alleviate SARS pathological characterization [20, 24]; C–type lectin can activate NF–kippa B signaling [20, 25, 26]; NF–kippa B and TNF have an indirect regulatory relationship after coronavirus infection [20, 27]–indicating C–type lectin and related immune cells shall be the potential therapeutic target of SARI. We also inferring C–type lectin–dependent T cell network and the modeling results being verified by experiment data.
Studies have shown that macrophage–derived C–type lectin can recognize TDM(trehalose 6,6'–dimycolate) and activate NF–kippa B signaling [20, 25, 27]. TDM is a surface antigen of bacteria such as mycobacterium, which can be recognized by C–type lectin and induce the immune response of macrophages [28, 29]. TDM also can induce pneumonia and activate the immune function of Th cells [30, 31]. So it also meets logic to try the TDM–aqueous solution as antigen adjuvant to activate the adaptable immunology of organ to against COVID–19 [26, 31, 32]. This prediction has been testified to some extent by other studies such as: CELC4d (PD id = 3whd) can activate NF–kippa B dependenting CARD9/Bcl10/Malt1 for TDM–induced Mincle expression and activating NF–kippa B signaling can alleviate SARS pathological characterization [20, 24, 26], whose expression rate positively correlates with NF–kippa B and interacting with spike protein to inhabit Spike–ACE2 complex formation which we has illustrated above. We also notice that CD209 is highly expressed in day seven and one of its role is facilitating SARS–Cov spike protein–bearing pseudotype driven infection of permissive cells in vitro, but SARS patients with CD209 does not show significant chance of having poorer prognosis (60% is not a persuasive data ) [33], which claims for further elucidating the function of corresponding cells during virus infection.
Meanwhile, to alleviate symptoms of SARI, we suppose some drugs that are effective in treating TDM–induced pneumonia considering the antigen structure similarity:radix sophorae [34]; lactoferrin [35]. Also, drugs that increase the number of immune cells and activate cytokines such as TNF and IL6, shall be taken into consideration: Astragalus membranaceus [36–39].