By surveying the PubMed database, we identified 149 experimentally verified small-molecule inhibitors whose SARS-CoV-2 drug targets and drug-interacting viral residues are known. They include the FDA-approved drug, remdesivir, as well as EUA-approved nirmatrelvir but not molnupinavir since there is no molnupinavir-bound SARS-CoV-2 RdRp structure. Supplementary Table S1 lists for each viral protein target, the drug candidates, the PDB code of the viral protein/inhibitor complex and the drug-interacting SARS-CoV-2 residues.
Most of the drug candidates in Supplementary Table S1 target a specific viral protein. However, some of them can bind to multiple sites in the same protein. For example, YM155, an anti-cancer drug in clinical trials, is found in three disparate sites of papain-like protease (PLpro) in the crystal structure of SARS-CoV-2 PLpro-YM155 complex26. Six drug candidates; viz., suramin, quercetin, compounds 7 and 13, ebselen and disulfiram, target more than one SARS-CoV-2 protein. Suramin, a highly negatively charged molecule that has been used to treat African sleeping sickness and river blindness, binds to both SARS-CoV-2 Mpro and RdRp. It is thought to act at an allosteric site in Mpro, causing conformational changes that alter protease activity27. It can also bind to the RdRp active site, blocking the binding of both RNA template and primer strands28. Quercetin, identified as a SARS-CoV-2 Mpro competitive inhibitor by an activity-based experimental screening, binds to the Mpro catalytic site29 as well as the spike receptor-binding domain30. It exhibits a dose-dependent destabilizing effect on the protease stability and inhibits the interaction between spike and human angiotensin-converting enzyme 230. Compounds 7 and 13, found using pharmacophore-based virtual screening, are peptidomimetic inhibitors of Mpro and PLpro as well as human furin protease31. Ebselen and disulfiram are Zn2+-ejecting compounds that can simultaneously target reactive cysteines (free or Zn2+-bound) in multiple SARS-CoV-2 nonstructural proteins (nsps) comprising a replication transcription complex that replicates and produces subgenomic mRNAs encoding accessory and structural proteins32–34. Notably, ebselen forms a covalent bond with the catalytic Cys in Mpro, as seen in the 2.05 Å crystal structure of the ebselen bound to Mpro35.
The results in Supplementary Table S1 show that efforts to develop SARS-CoV-2 antivirals have focused on (i) nsp5 Mpro (the most targeted protein), (ii) nsp3 PLpro domain, and (iii) the nsp12 RdRp catalytic domain. Both Mpro and PLpro are excised from the viral polyproteins (pp1a and pp1ab) by their own proteolytic activities. For each of these 3 drug target proteins, we outline below its functions, overall structure, and distinct binding sites/motifs from available structures in the Protein Data Bank (PDB) 36. Then, we describe where the drug ligands bind and the selection pressure of the drug-binding residues, which are numbered according to the respective PDB structure rather than the coding sequence. We underscore those SARS-CoV-2 Mpro, PLpro, and RdRp residues under positive selection, as they might affect drug efficacy based on their reported roles.
SARS-CoV-2 Mpro (3CLpro or nsp5)
The main protease (Mpro), also called chymotrypsin-like protease (3CLpro) or nsp5, is a cysteine protease that cleaves the two viral polyproteins into 16 constituent nsps that are crucial for viral replication and maturation. It is the most popular SARS-CoV-2 nsp drug target because (i) it plays a prerequisite role for viral replication, (ii) it has no human homolog but is conserved among coronaviruses, and (iii) it has unique cleavage specificity, cleaving sequences after a Gln, unlike known human cysteine proteases37–40. Thus, drugs targeting Mpro would have reduced off-target activities and thus less side effects41.
Monomeric Mpro consists of an N-terminal finger (residues 1–7) and three domains: the chymotrypsin-like domain I (residues 8-101), the picornavirus 3C protease-like domain II (residues 102–184) and domain III (residues 201–306)42. Dimerization is needed for Mpro function, as interaction between the protomers, in particular the interaction between the N-terminal S1 of one protomer and E166 of the other promoter, keeps the enzyme in an active conformation37. Thus, the N-terminal finger, E166, and the unique catalytic C145-H41 dyad play a vital role in proteolytic activity. Mpro has two distinct binding regions (Fig. 1): (i) a substrate-binding site, containing the catalytic C145-H41 dyad, located in the cleft between domains I and II, and (ii) the dimerization interface involving residues from the N-terminal finger, the catalytic cleft and domain III39,43−45.
Figures 1b,1c show the number of Mpro inhibitors in parentheses targeting (i) the catalytic C145-H41 dyad (purple), (ii) substrate-binding residues (light blue), (iii) dimerization interface residues (pink), and (iv) residues shared by the catalytic cleft and the dimer interface (yellow). All 94 inhibitors targeting Mpro including the EUA-approved drug nirmatrelvir (PF-07321332) bind in the catalytic cleft. They most frequently target the catalytic C145-H41 dyad (74 and 65 compounds) as well as E166 (69 compounds), which is important for dimerization. However, 3 of the 94 drug candidates (omeprazole, punicalagin, and chebulagic acid) also target two residues (S1 and K137) at the dimer interface. Punicalagin and chebulagic acid are also allosteric inhibitors of Mpro enzymatic activity27,46.
Figure 2 depicts the SARS-CoV-2 Mpro residues that exhibit evidence (p < 0.1) for negative selection (blue) or positive selection (red) in any of the ten rounds of sampling or no evidence for negative/positive selection (white). For example, out of ten sampling rounds, the catalytic C145 showed evidence of negative selection in 4 rounds, but no evidence of positive/negative selection in the other rounds. Most of the residues targeted by the Mpro inhibitors44,45; viz., T25, T26, H41, Y54, K137, F140, L141, N142, S144, C145, H163, H164, E166, L167, P168, H172, D187, R188, Q189, T190, Q192 are under negative selection. The other drug-interacting residues (S1, T24, M49, G143, M165) show no evidence for negative/positive selection, but are highly conserved. Residues that are under positive selection do not directly interact with the Mpro inhibitors except for A191.
A191 displayed evidence of positive selection in 2 of the 10 sampling rounds. It is targeted by 6 drugs; viz., PF-00835231, efonidipine, nelfinavir, bisindolylmaleimide IX, as well as compounds 2a and 151. PF-00835231, a ketone-based covalent inhibitor, forms van der Waals interactions with the A191 backbone47. However, due to its low oral bioavailability, it has been superseded by the oral drug, PF-07321332 (EUA-approved nirmatrelvir), which does not interact with any residue under positive selection pressure. Interestingly, G15, K90, and P132, which are often mutated in current SARS-CoV-2 variants of concern 42, are under positive selection. Since the mutation of K90 to Arg is expected to improve dimerization 42, it may affect compounds that target the dimer interface.
SARS-CoV-2 PLpro
SARS-CoV-2 nsp3-encoded PLpro protease is also a popular drug target, as it is involved in viral replication and host immune response suppression and is conserved among coronaviruses40,48. This protease recognizes the LXGG↓(X) cleavage motif at the nsp1/2, nsp2/3, and nsp3/4 boundaries of the viral polyprotein and at the C-termini of host ubiquitin and interferon-stimulated gene 15 (ISG15)49. Hence, in addition to cleaving viral substrates, PLpro also cleaves post-translational modifications on host proteins to evade antiviral immune responses50. Unlike Mpro, PLpro employs a catalytic triad (C111-H272-D286) and is catalytically active as a monomer. PLpro consists of an N-terminal ubiquitin-like subdomain and a right-handed thumb-finger-palm catalytic unit48. It has four binding sites (Fig. 3a): a Zn2+-binding site, a viral substrate-binding channel, and two host ubiquitin/ISG15-binding subsites called SUb1 and SUb226,40,42,43,48,51. The Zn2+-binding site, lined by 4 conserved cysteines (Fig. 3b), is essential for structural integrity and protease activity52. The SUb2 subsite consists of D62, R65-V66, F69-E70, H73, T75, N128, N177, and D179 (Fig. 3c). The SUb1 subsite consists of W106-Y112, E161-D164, R166-E167, L199, E203, P223, T225, K232, P248, Y264, Y268-G271, Y273, and T301 (Fig. 3d)53. Notably, W106 and N109 contribute to the stabilization of the oxyanion transition state of peptide hydrolysis40, whereas L162 and E167 are involved in interactions with host ISG1554. The SUb1 subsite partially overlaps with the viral substrate-binding channel containing the C111-H272-D286 catalytic triad, G163-D164, P247-P248, Y264, and a flexible loop termed BL2 (residues 267–271)40,42,43,48,51. The BL2 loop is important as it recognizes the LXGG motif in-between viral proteins and closes upon substrate/inhibitor binding51.
Most of the PLpro inhibitors target the active-site cleft, 3 compounds target the Zn2+-binding site, and only one compound (YM155) is found in the SUb2-binding site (Fig. 3). Most of the drug candidates target residues involved in binding the substrate in the SUb1-binding site. In particular, Y268 is the most frequently drug-targeted residue (11 compounds), followed by D164, P248, and Y264 (10 compounds each), and Q269 (8 compounds). Two compounds, VIR250 and VIR251, are covalently bonded to the catalytic C11155.
Comparison of Figs. 2 and 4 shows that there are more residues under positive selection (red residues) in PLpro than there are in Mpro. Nearly all the drug-interacting residues that are under positive selection are located in the SUb1 subsite, which binds host ubiquitin and ISG15 proteins. These residues include Y268, Y264, G271, and T225 which are targeted by 11,10, 2, and 1 inhibitor, respectively. Notably, Y268 in the BL2 loop can form hydrogen bonding and/or π-stacking interactions with the drug candidates; hence, its mutation could affect the BL2 loop conformation and attenuate drug interactions. Indeed, the mutation of SARS-CoV-2 PLpro Y268 to Thr or Gly substantially reduced the inhibitory effect of the non-covalent inhibitor, GRL-061750. Another drug-interacting residue under positive selection is P299, which forms hydrophobic contacts with only 1 drug candidate, XR8-2456. Interestingly, the 2.1 Å crystal structure of SARS-CoV-2 PLpro-YM155 complex (PDB 7D7L) shows YM155 forming van der Waals or hydrogen-bonding interactions with (i) C192, Q195, T225, and C226 in the Zn2+-binding site, (ii) P248, Y264, Y268, and Y273 in the viral substrate-binding channel, and (iii) F69 and H73 in the SUb2 subsite26. Although C192 and H73 are under negative selection, neighboring G193 and Y71, respectively, are under positive selection. Since G193, T225, Y264, Y268 and Y71 are under positive selection, their mutations may attenuate binding of YM155 to all 3 sites.
Apart from Y71 and G193, several other residues under positive selection are also near the drug-interacting residues. Positively charged K232 is near the negatively charged Zn2+-site (Fig. 3b), and its mutation to Gln present in the SARS-CoV-2 gamma variant of concern (K232Q) enhanced ubiquitin cleavage in vitro, which could affect the host immune response in infected cells53. R166 is near two popular drug-interacting acidic residues, D164 and E167, whereas (V159, G160), (Y207, G209, T210), and K297 are adjacent in sequence to E161, M208, and P299, respectively, which each interact with only one inhibitor (Fig. 3d). Surprisingly, D286 is under positive selection even though it is part of the catalytic triad. By forming a hydrogen bond with the H272 side chain, D286 serves to align H272 to act as a general acid/base during catalysis42. This role of D286 may be compensated by a buried water molecule as found in Mpro, which lacks a third catalytic residue.
RNA-dependent RNA Polymerase (nsp-12)
The nsp12 RdRp is another key drug target because it is responsible for viral RNA synthesis, and is highly conserved among coronaviruses with no known mammalian homologs16. The nsp12 subunit consists of three domains: the N-terminal nidovirus RdRp-associated nucleotidyl-transferase domain (NiRAN, residues Q117–A250), the interface domain (residues L251–R365), and the finger–palm–thumb RdRp catalytic domain (residues L366–L932)40. By itself, nsp12 shows little or no polymerase activity, which requires the help of nsp7 and nsp8 cofactors to increase nsp12 binding to the template-primer RNA5. Two conserved Zn2+-binding motifs (H295, C301, C306, C310 and C487, H642, C645, C646) maintain the structural integrity of RdRp5. In addition to the two Zn2+-binding sites, seven conserved structural motifs (labelled A–G) in the RdRp catalytic domain are involved in binding the RNA template and primer strands and/or incoming nucleotide. During the template-directed RNA synthesis, the single-stranded RNA template passes along a groove clamped by motifs F (T538-V560) and G (K500-R513) and enters the active site composed of motifs A-D 57. Motifs A (N611-M626) and C (F753-N767) contain the catalytic 618DX4D623 and 759SDD761 motifs, respectively, where the conserved acidic residues are involved in regulating catalytic activity and binding two catalytic Mg2+ ions57. Motif B (T680-T710) contains a flexible loop (S682-T686) involved in template binding and translocation of the nascent dsRNA57. Motif E (H810-K821) interacts with the primer RNA strand5, whereas motifs D (L775-E796) and F interact with the incoming NTP phosphate group57.
Nearly all identified nsp12 drug candidates, including FDA-approved remdesivir, target residues comprising the conserved structural motifs in the nsp12 catalytic domain. They most frequently interact with positively charged R555 in motif F, which contacts the + 1 base of the primer strand RNA, negatively charged D623 in the catalytic 618DX4D623 motif as well as S682 and N691 in motif B (see Fig. 5). None of the nsp12 drug candidates identified bind to the two Zn2+-sites or motif D.
Most of the drug-interacting residues, in particular, the 759SDD761 catalytic residues are under negative selection (Fig. 6). Notably, S861, which plays a key role in the delayed chain termination mechanism of remdesivir, is under negative selection. However, R555, which is most frequently targeted by the SARS-CoV-2 RdRp inhibitors including remdesivir, show no evidence for either negative/positive selection. On the other hand, in vitro evolution studies have identified three nsp12 mutants, viz., S759A, V792I, and E802(A/D), to confer resistance to remdesivir17,18,58. However, S759 comprising the 759SDD761 catalytic motif and V792 are both under negative selection, suggesting that their mutations would decrease SARS-CoV-2 fitness. Although highly conserved E802 shows no evidence for either negative/positive selection, E802(A/D) mutants decreased viral replication relative to wild-type SARS-CoV-2 nsp12 in in vitro assays, indicating that E802 mutations impart a fitness cost58.
None of the drug-interacting SARS-CoV-2 RdRp residues are under positive selection; however, some are near residues that are under positive selection. For example, T324, which displayed evidence of positive selection all 10 sampling rounds, is next to two prolines (P322 and P323) that are predicted to interact with the inhibitor Taroxaz-10459. Another residue under positive selection, T582, is close to A580, which has packing interactions with suramin in the crystal structure of the SARS-CoV-2 RdRp bound to suramin (PDB 7d4f).