A novel approach for selecting potent peptide inhibitors of the SARS-CoV2-M pro protease

Here, we describe a new high throughput selection technology for identifying exceedingly specific and effective peptide inhibitors. This technology incorporates the co-expression of a cytotoxic protein and a library of peptide variants inserted directly into a loop of a carrier protein. Selection is based on the cytotoxicity neutralization by a member of a peptide library binding to and inhibiting the cytotoxic protein. Our technology provides the flexibility of screening both cyclic and linear peptides. Herein, we demonstrate the power of this technology by developing selective inhibitors of the main coronavirus protease (M pro ) in a matter of weeks by screening libraries of cyclic and linear peptides. This technology opens up an opportunity to develop inhibitors for a wide range of previously undruggable targets.


3
Modern day drug discovery has focused on the development of small molecule therapeutics. While small molecules offer many advantages, such as economical manufacturing, lower complexity and better bioavailability as compared to legacy drugs, they can only target 2-5% of the proteome 1,2 . Biologic-based drugs have a larger binding surface and therefore a higher target specificity, allowing them to access targets that are beyond the reach of small molecules.
However, most biologics are large molecules that cannot cross cell membranes, which restricts their use to extracellular targets. Peptide drugs, on the other hand, have advantages of both small molecule therapeutics and biologic drugs, but do not have many of their disadvantages. Like biologic-based drugs, peptides have a large binding surface to target leading to their higher specificity and fewer off-target effects 3,4 . Similar to small molecules, they are smaller, have lower immunogenicity 5,6 and higher bioavailability. Recent advances in cell penetrating peptide technology have enabled peptide drugs to be designed to access intracellular targets 7,8 . Peptide drugs can therefore achieve the level of bioavailability comparable to that of small molecule therapeutics and activity and safety of biologic-based drugs which makes them prime candidates of drug development for previously "undruggable" targets.
The development of therapeutic peptides commonly starts with a combinatorial biology approach that involves the generation of chemical or biosynthetic peptide libraries. Chemical peptide synthesis is a well-established method for developing peptide libraries 9,10 ; however, the biosynthetic approach offers many advantages. One key advantage is the library size.
Biosynthetic libraries can easily contain as many as 10 9 peptides, while chemical synthesis is limited to approximately10 4 peptides. The most commonly used biosynthetic selection methods are phage display 11 , yeast display 12 and RNA display 13,14 . All of these methods select peptides 4 that bind to the target protein most tightly. However, a major limitation to these approaches is that the best binders may not be the best inhibitors of the target protein.
One way to solve this problem is to establish a link between binding and function by screening peptides intracellularly for their ability to attenuate or inhibit cellular processes. None of the existing cell-based assays has taken full advantage of this approach. Currently, the most promising in vivo peptide selection method, called split-intein circular ligation of peptides and proteins (SICLOPPS), is based on protein trans-splicing. This involves self-excision of an internal protein segment (intein) resulting in a cyclized polypeptide 15 . Typically, such libraries are screened in E. coli cells using bacterial two-hybrid system. Selection relies on disruption of a targeted protein-protein interaction (not function), detected through a reporter gene expression 16 .
False positive clones often result due to fluctuations of gene expression, mutations in the regulatory sequences and mutations in the bacterial genome. Additionally, construct design for these peptide "processing" enzymes (inteins) is complex, they mostly work in a reduced environment 17 , and are often slow 18 . To solve this problem, we tested a new selection system based on direct inhibition of a cytotoxic protein that is not dependent on transcription of a reporter gene (Fig.1). Our peptides mimic cyclization by insertion into a protein loop, thus avoiding the need for any processing enzymes (like inteins). This gives the flexibility of screening both cyclic and linear peptides, which further increases the library size and improves the chances for identification of the optimal peptide inhibitor. As a proof-of-concept for this new approach, we performed the selections for a small pool of peptides (10 6 variants) that consisted of cyclic and linear peptide inhibitors targeting main coronavirus protease (M pro ). Within five weeks, we identified an inhibitor with an IC50 of 33 µM, validating this screening approach.

Results
Selection System. The selection system relies on the toxicity of a particular protein to its host ( Fig.1). A peptide variant is co-expressed with the cytotoxic target protein in the host cell.
The host cell only survives if a peptide variant binds to the protein, and neutralizes its cytotoxicity. Using this approach, we developed inhibitors of the difficult-to-target SARS-CoV2 main protease (M pro ) 19 that have the potential to be developed into a pancoronavirus antiviral drugs, for which there is an urgent unmet medical need, due to the regular frequency of coronavirus-caused pandemics that have occurred over the last century.
Libraries. Our constructs are presented in Fig.2. The peptide libraries were inserted into ubiquitin as a carrier protein, because it is small (8.6kD), stable in Ecoli, and has previously been used to express proteins and peptides 20,21 . The first peptide library was random, built with 14 degenerate codons, resulting in 1.6x10 18  2) and taken through five rounds of selection in Ecoli. We screened 1 million clones at each round. To weed out false positives that may result from frame-shifts, deletions of M pro and somatic mutations libraries were re-cloned into the original vector (pUbi-Mpro) after each round of selection. The fifth round of selection generated several sequences that were significantly overrepresented in the population. We chose the 11 most abundant peptides for further testing. 6 Seven of these peptides were linear, 4 were fully integrated in the loop of the carrier protein and, therefore, cyclic (Supplementary Table 1). The most abundant peptides were synthesized in a linear form and tested in an in vitro M pro activity assay. Out of 11 peptides tested, 7 did not have any effect on protease activity (false positives). The other 4 peptides inhibited the M pro with IC50 ranging from 100 µM to 1.2 mM (Supplementary Table 1). The two best peptides were M1 (RQGLDEDLHRW) and M5 (TANAFLS). Their IC50 was 249 and 101 µM, respectively (Table   1). Peptide M1 originated from the random library and peptide M5 originated from the library based on the published sequences that are recognized by M pro . To be consistent with the structure in the original screen, we also synthesized peptide M5 in a cyclic form (peptide M5c) fused to a custom cell penetration sequence to improve its stability and intracellular transport. Cyclization improved IC50 of M5 peptide significantly from 101 to 33 µM (Table 1).
Following first 5 rounds of selection, peptides M1 and M5 were further mutagenized by PCR with degenerate primers, cloned in pUbi-Mpro-CAT vector and selected for 5 more rounds.
pUbi-Mpro-CAT was identical to pUbi-Mpro except that M pro coding sequence was fused to the chloramphenicol acetyl transferase (CAT) gene. This modification made selection more stringent since cell survival now required not only the presence of the peptide inhibitor but also intact CAT protein which prevented bacteria from deleting M pro . In this round of selection there were no false positives, showing that the addition of the CAT improved screening effectiveness.
The largest fraction of the resulting sequences was made of peptides M74 and M78 (Table 1).
These peptide variants were synthesized and tested in the in vitro assay (Table 1). They did not show any improvement in the standard conditions (IC50 was 195 and 244 µM, respectively).
However, under denaturing conditions (heat shock at 53 o C) which we used to model partial protein unfolding, the IC50 of peptide M74 improved 10-fold (21.2µM, Table 1).

Discussion
A significant disadvantage of current display technologies (eg. Phage display, RNA display, yeast display) is the lack of a connection between binding and function. That means that a peptide binding to the target protein may not necessarily inhibit its enzymatic activity or disrupt a protein-protein interaction. We solved this problem by developing a screen based on the cytotoxicity of the target protein. Other in vivo selection methods have relied on the toxicity of an enzyme's (target protein) substrate 24 , products of the enzymatic reaction 25 , a particular intermediate 26 , or resistance to inhibitors 27 . Our selection approach is the first to capitalize on the cytotoxicity of the target protein itself.
The problem of protein toxicity is widespread in the field of protein expression. Usually, it is a problem that has to be minimized. Our technology leverages this "problem" by screening peptides for their ability to neutralizing the cytotoxicity of the target protein. It involves coexpression of the cytotoxic target protein and a library of peptide variants. Host cells survive only when a particular peptide variant inhibits the cytotoxic protein (Fig. 1).
To demonstrate the power of this technology, we targeted the coronavirus M pro protease, that is highly conserved among various coronaviruses and plays a pivotal role in their life cycles 28 . Mutations in M pro are often lethal to the virus which is why drugs targeting the M pro enzyme have the potential to significantly reduce the risk of mutation-mediated drug resistance and display broad-spectrum antiviral activity 19 . To date, no M pro targeted antiviral has been developed. Repurposing of antiviral drugs for other viruses 29,30 have not proved effective. Drug development approaches based on converting peptides into peptidomimetics are very challenging, because side chain modifications often abolish inhibitor activity 31 and result in off-8 target effects 32 and toxicity 33 . Our approach avoids these pitfalls because our peptides are selected intracellularly biasing libraries towards candidates with lower toxicity, and higher stability.
Two best peptides (M1 and M5) generated by our selection system showed inhibitory activity in the low micromolar range in an in vitro assay (Table 1), demonstrating the utility of our selection approach. Peptide M1 was selected from the random library and peptide M5 from the candidate-based library. This observation demonstrates that our approach can identify inhibitors without prior knowledge of their ligands and can improve the inhibitory activity of known ligands. It is also important to note that peptide M5 is fully integrated in the first loop of the carrier protein (ubiquitin) which gives it a cyclic structure. Consistent with this observation, when peptide M5 was cyclized, its IC50 improved significantly from 101 to 33 µM ( Table 1) which confirms that our technology useful for screening both linear and cyclic peptides.
Interestingly, these peptides were not as potent under native conditions (IC50 were at 238 and 195 µM respectively (Table 1); however, under denaturing conditions (heat shock at 53 o C) the IC50 of the peptide 74 improved 10-fold. We speculate that M74 may inhibit M pro during protein folding. This type of inhibitors was previously identified for HIV protease 34 and chicken egg lysozyme 35 . It is believed these inhibitors interfere with protein folding by binding to certain protein sequences called local elementary structures (LES). These local elementary structures are believed to be part of protein's hydrophobic core which can only be exposed during partial protein denaturation. Not surprisingly, M74 was the only peptide with a stretch of hydrophobic amino acids (LAVVAL) at its C terminus, which have the potential to interact with the hydrophobic core of partially unfolded protease. 9 As for peptide (M78), it had 2 cysteines which are highly reactive groups that have the potential to form disulphide bonds not only with the cysteine in the active site of M pro protease but also with each-other in a matter of minutes 36 . Therefore, we suspect that the life of this peptide inhibitor is very short and we did not use the best conditions to detect its activity A weakness in our study is that we only screened a small fraction of all available peptides in our library (1 million clones at each stage of selection). Despite this shortcoming, we were able to rapidly identify (after a few weeks of screening) potent peptide inhibitors with low M activity from both our random and candidate base peptide libraries (Table 1), demonstrating the functional utility of our approach.

Constructs.
All genes were codon optimized, synthesized as gBlocks by IDT and cloned into the pBAD-HisA plasmid (Thermo Fisher Scientific). pBAD-HisA plasmid was amplified with primers P33 and P34 (Supplementary Table 2 The pUbi-Mpro constructs (Fig.2.) use an arabinose-inducible promoter, which is also an operon (Fig.2) to express M pro , ubiquitin, or ubiquitin-CAT. A Shine-Dalgarno sequence is inserted between M pro and Ubiquitin to ensures the expression of both genes. The ubiquitin gene was synthesized by IDT and amplified by primers T227 and T228 (Supplementary Table 2). The amplified fragment was cut with HindIII and PacI restriction enzymes. PCR conditions are described above, with the exception, that extension was done for 1 minute. The M pro -GST fusion was amplified with primers T229 and P108 (Supplementary Table 2) and cut with PacI and XhoI restriction sites as described above. Following amplification, both PCR fragments were gel-  Table 2). The CAT gene was synthesized by IDT and amplified with primers T334 and T335 (Supplementary Table 2). All fragments were gel-purified with the QIAGEN gel-band purification kit. Then all fragments were mixed together, amplified with primers T227 and T335, gel-purified with QIAGEN gel-band purification kit and cut with HindIII and XhoI restriction enzymes. Finally, the construct was ligated with pBAD backbone (also cut with HindIII and XhoI), tested and sequenced as described above.

Construction of Peptide libraries. Random and candidate libraries of the M pro -inhibitor
peptides were cloned into the first loop of ubiquitin (in the pUbi-M pro constructs), which was shown previously to be tolerant to insertions and deletions 23 .

12
The random library, was built with 14 NNK codons and amplified as two fragments which were united by ligation. The first fragment was amplified with flanking forward primer P23 (Supplementary Table 2) and the reverse primerT232 (Supplementary Table 3). The second fragment was amplified with primer T233 (Supplementary Table 3) and the reverse primer P24 (Supplementary Table 2). The PCR reaction (20 µL)  The M pro candidate library was based on sequences recognized by M pro22 . These sequences were mutagenized by degenerate synthetic oligonucleotides. Library size was controlled by targeting mutations to one position in each codon with only first or the second codon position being changed. Library construction was done as described above with the following differences: the first fragment was amplified with flanking forward primer P23 13 (Supplementary Table 2) and one of the reverse primers (Supplementary Table 3, primers T234 through T255). The right fragment was amplified with primer T233 (Supplementary Table 2) and one of the forward primers (Supplementary Table 3, primers T256 through T277). Following amplification, PCR fragments were gel-purified, mixed and ligated with T4 DNA ligase. The full-length PCR band was amplified with the flanking primers P276 and P277 (Supplementary Table 2), digested with KasI and XbaI restriction enzymes and ligated with pUbi-Mpro as  Fusion Peptide Libraries. Sequences corresponding to the best two peptides (M1 and M5) were mutagenized in order to increase diversity. Two DNA fragments were amplified: left and right, which were reunited by ligation (Fig. 3) as described above for other libraries. The first fragment was amplified with the flanking forward primer P23 (Supplementary Table 2) and the reverse primers (Supplementary Table 3, primers T370 through T396 for peptide M5 or T422 through T444 for peptide M1). The second fragment was amplified with reverse primer P24 and forward primers (Supplementary Table 3, primers T343 through T369 for peptide M5 or T397 through T421 for peptide M1). Fragments were processed, inserted into pUbi-M pro -CAT and transformed into 10G cells as described above for the secondary screen. Following selection libraries were amplified using primers P276 and P277 and re-inserted into pBAD_Ubi-Mpro-CAT as described above. This selection process was repeated 5 times. Selected sequences were analyzed by NGS and Sanger Sequencing as described above.
Inhibition of the M pro Protease activity by the peptides in vitro. Peptides were synthesized by Elim Biopharmaceuticals and purified to 95%. Inhibitory activity of these peptides on M pro was tested using 3CL Protease Kit from BPS Bioscience (Catalog # 78042-1) according to the manufacturer's recommendations. Briefly, 30 μl of 3CL Protease enzyme solution (0.05 ng/ul) was mixed with 10 μl of peptides at different concentrations and preincubated for 30 min at room temperature. Following preincubation, 10 μl of 200 μM 3CL Protease substrate was added and incubated for 4 h at room temperature. The fluorescence intensity was measured in a microtiter plate-reading fluorimeter with excitation at 360 nm and emission at 460 nm. The heat shock version of this protocol included incubation of the protease and peptide mixture at 53 o C for 5 min followed by incubation for 4 h at room temperature. The 16 heat shock treatment led to a 35% loss of enzymatic activity in the positive control (enzyme without inhibitor). See Supplementary Fig.1 for details.