Chemicals and reagents
Unless otherwise stated, all chemicals, reagents, and primers were obtained from Sigma Aldrich (Buchs, CH). Restriction enzymes and their buffers were obtained from New England Biolabs (Ipswich, USA). Synthetic genes were obtained from Integrated DNA Technologies (Leuven, BE) or Twist Bioscience (San Francisco, USA). Kits for plasmid isolation and DNA purification were obtained from Zymo Research (Irvine, USA). Peptides in either purified (>90%) or crude format were obtained from Pepscan (Lelystad, NL). Sanger-sequencing was done at Microsynth (Balgach, CH).
Bacterial strains and cultivations
Unless otherwise stated, all experiments were performed using Escherichia coli TOP10 (F− mcrA Δ(mrr-hsdRMS-mcrBC) φ80lacZΔM15 ΔlacX74 recA1 araD139 Δ(ara-leu)7697 galU galK λ− rpsL(StrR) endA1 nupG; Thermo Fisher Scientific, Waltham, USA). In this study, all cultivations were performed either in 14 ml polypropylene tubes (Greiner, Kremsmuenster, AT), filled with 5 ml of lysogeny broth (LB) medium (Difco, Becton Dickinson, Franklin Lakes, USA), or in 96-deep-well polypropylene plates (Greiner, Kremsmuenster, AT) filled with 500 µl of LB-medium. All samples were incubated at 37°C with agitation on a shaker (Kuhner, Birsfelden, CH) operated at 200 r.p.m. and 25 mm amplitude. All media were supplemented with the appropriate antibiotic for plasmid maintenance (50 µg ml−1 kanamycin; 100 µg ml−1 carbenicillin) and 1% (w/v) d-glucose for repression of gene expression from catabolite-repression sensitive promoters such as PBAD. In the case of peptide expression experiments, cultures were incubated without d-glucose and 0.3% (w/v) of the inducer l-arabinose was used for induction. For all cultivations on solid medium, 15 mg ml−1 agar (Difco) was added to the broth, and incubation was performed without shaking in an incubator (Kuhner) at 37°C. If not indicated differently, the optical densities (OD) of bacterial cultures were determined by measuring light scattering at 600 nm using a UV/VIS spectrophotometer (Eppendorf, Hamburg, DE).
In silico generation of peptide library
We collected all peptide sequences (called “parents”) available on the APD in May 2017 (https://aps.unmc.edu/).[4] These sequences were used as input queries to find sequence-similar peptide sequences in the NCBI non-redundant nucleotide collection (nr/nt), a collection that holds sequences from GenBank, European Molecular Biology Laboratory (EMBL), DNA Databank of Japan (DDBJ), and Reference Sequence database (RefSeq), as well as translated protein information from the protein database (PDB).[10]By applying tblastn, 170,300 additional peptide sequences (called similars) were found.[29]Because we were limited to 12,412 different peptides with a maximum length of 42 amino acids (the chosen platform for the synthesis of the peptide-encoding oligonucleotides allowed 12’412 different sequences with a maximal length of 170 bases), we discarded similars with sequence similarity to the respective parent of less than 62.2%. The following parameters were used for the tblastn search: maximum sequences = 100; matrix = BLOSUM62; gap cost = 11.1; word size = 6; active low complexity filter; adjustment = conditional compositional score matrix adjustment.
Sequence distance among parents and similar
To visualize sequence diversity among parents, we created a sequence-based phylogenetic tree. We performed pairwise global alignment of all parent sequences using the Needleman-Wunsch algorithm, as implemented in the R Bioconductor package ‘Biostrings’ (https://bioconductor.org/packages/release/bioc/html/Biostrings.html). The BLOSUM62 substitution matrix was used to compute the alignment scores, which were converted into pairwise distances following the method Scoredist.[30] Based on the pairwise distances between parents, we used hierarchical clustering with average linkage to compute a dendrogram of sequences reflecting their similarities. parents and their tblastn-derived similars were consolidated into groups, which were named after the parent from the APD (https://aps.unmc.edu/). In the sequence-based phylogenetic tree, each similar was stacked on top of its parent at the tip of the dendrogram. A similar may appear multiple times if it was found multiple times in the tblastn search using different parents.
Peptide-encoding DNA architecture
The corresponding oligonucleotide sequences of the peptide library were synthesized using microarray technology supplied from CustomArray Inc. (now GeneString, Piscataway, USA). The chosen platform allowed 12’412 different oligonucleotides with a maximal length of 170 bases. A generic oligonucleotide design employing four functional units was created (Figure S3): A coding unit, a filler unit, and two universal units for amplification. This process was automated for each sequence by using an in-house written script in R. The coding unit contained the reverse translation of the peptide amino acid sequence into a codon-optimized DNA for E. coli. We always chose the most abundant codon for each amino acid. In cases in which restriction sites had been introduced that could potentially interfere with subsequent manipulations, the crucial codon was replaced by the second most abundant one for this amino acid. The filler sequence was added to compensate for the various lengths of peptide genes (shortest coding sequence = 15 nucleotides, longest coding sequence = 126 nucleotides) and adjust the total of filler and coding unit to 129 nucleotides for all members of the library. To do so, we first added a UAA stop codon to the end of the coding sequence and then added downstream a semi-random sequence, ensuring a GC content of 40% for the filler sequence and limiting the number of identical nucleotides following each other to three. By adding this filler sequence we maximized sequence disparity at the DNA level (many coding sequences are homologs) thereby potentially increasing both synthesis and, later, sequencing quality. Two amplification units, of 23 and 18 bases, respectively, were appended upstream and downstream of the coding sequence and filler unit and contained the ribosomal binding site and restriction sites for the enzymes PstI and HindIII. Two amplify the peptide-encoding DNA, primer 1: CTGCACAAAGCTTACGTG, complementary to the upstream amplification unit, and primer 2: CACGTAAGCTTTGTGCAG, reverse complementary to the downstream amplification unit were used. The final 170 bases long oligonucleotide sequences as synthesized are listed by ID in File S2 (erroneous sequences were discarded).
Peptide-encoding DNA cloning
The chemically synthesized and single-stranded oligonucleotides were separated from their array and we received them as a pool. This pool was aliquoted in 10 mM Tris-Cl, 1 mM EDTA, pH 8 and deep-frozen at 80°C. The pool was amplified by polymerase chain reaction (PCR) in a 50 µl reaction using 5 ng of the template and 10 µm HPLC-purified primer 1 and primer 2, complementary to the amplification sites, and 25 µl of Phusion® High-Fidelity PCR Master Mix with HF buffer. The amplification was performed using 25 cycles of 98°C for 15 s, 55°C for 20 s, and 72°C for 5 s. The now double-stranded peptide-encoding DNA sequences were purified using a DNA purification kit. DNA concentration was measured using a NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific) and 500 ng of the purified product was used for a restriction digest using enzymes HindIII-HF and PstI-HF in Cutsmart buffer. The digested product was again purified using a DNA purification kit and ligated to plasmid pBAD (Thermo Fisher Scientific) digested with the same enzymes.[31] This plasmid harbored the tightly controllable PBAD promoter for peptide gene expression, a pBR322 replication of origin, and a resistance gene encoding for beta-lactamase. For ligation, pBAD was purified using a 1% agarose gel and a DNA gel recovery kit after digestion. Next, T4 ligase (800 units) was used to ligate 100 ng of cut pBAD vector and 10 ng peptide-encoding DNA sequences in T4 ligase buffer (molar ratio of 7:1 insert:vector). The ligation mix was incubated for 14 h at 16°C. The ligation product was dialyzed in deionized water using filters (MilliporeSigma, Burlington, USA) and 1 µl of the mix was used to transform 20 µl of CloneCatcher™ Gold DH5G Electrocompetent E. coli (Genlantis, Burlington, USA) cells using electroporation. Recovered cells were plated and incubated overnight on LB agar plates supplemented with carbenicillin. Afterward, ~500,000 colonies were washed off the plates using LB medium, and the plasmids containing the peptide-encoding DNA sequences were extracted from 2.5*109 cells using a plasmid isolation kit. An aliquot of 5 ng of these plasmids was used to transform E. coli TOP10 cells using the protocol from the transformation above. A total of 1’000’000 colonies were recovered from the plates after overnight incubation by washing with LB medium, the suspension was diluted to OD = 1 with LB-medium, glycerol was added to a final concentration of 20% (v/v), and aliquots of 500 million cells were stored at 80°C.
Growth experiment
Three aliquots of E. coli TOP10 harboring the peptide-encoding DNA sequences on the pBAD plasmid were thawed and added to three 1 l baffled shake flasks containing 100 ml of LB medium + 100 µg ml−1 carbenicillin. The cultures were grown for roughly 7.5 h at 37°C. When the OD reached 0.2, the cultures were supplemented with l-arabinose to a final concentration of 0.3% (w/v) to induce peptide expression. Cell samples were taken from each biological replicate at the point of induction and 1.5 h, 3 h, and 4.5 h post-induction. The plasmids were extracted from all samples using a plasmid isolation kit.
NGS
For the generation of Mex growth curves, peptide-encoding DNA sequences on plasmids, collected from the three replicates across four time points during the growth experiment, were sequenced by NGS. Additionally, the abundance of peptide-encoding DNA sequences in the original oligonucleotide pool and after transformation of the assay strain E. coli TOP10 was assessed by NGS as well. Peptide-encoding DNA sequences were amplified by primer 1 and primer 2 using 100 ng of plasmid and the PCR-amplification protocol mentioned before, but only for 10 cycles to avoid amplification bias. The amplification product was purified using an agarose gel. Single Index PentAdapters from Pentabase were used to prepare PCR-free libraries with the KAPA HyperPrep Kit (now Roche, Basel, CH) according to the manufacturer's specifications. Libraries were quantified using the qPCR KAPA Library Quantification Kit. Libraries were pooled and sequenced PE 2x151 with an Illumina HiSeq 2500 using v4 SBS chemistry. Roughly 10% genomic PhiX library as spike-in to increase sequence diversity. Basecalling was done with bcl2fastq v2.20.0.422. The resulting fastq files were processed using in-house software written in R and C. This software aligns each sequence to our reference table of 12’412 sequences linking peptide-encoding DNA sequences and peptide sequence, identifies mismatches and sequencing errors, and counts how often each peptide-encoding DNA sequence was sequenced in each sample. NGS read counts for each sequence analyzed in Mex were listed with a unique identifier (ID) in File S2.
Generation of M ex growth curves
We used the standard workflow of DESeq2 (NGS read count normalization, dispersion estimates, and Wald’s tests) to analyze NGS read counts.[32] Only sequences that passed independent filtering were included in further analyses (= 10,633). To draw growth curves for each peptide-expressing strain, we calculated the log2-fold changes of NGS read counts (listed for each ID in File S2) between the time of induction and all other time points (1.5 h, 3.0 h, and 4.5 h post-induction). A Bayesian shrinkage estimator was employed to shrink the log2 fold-change for each ID (lfcShrinksID) between all time points using the R/Bioconductor package ‘apeglm’.[33]To draw the Mex growth curves, we calculated a strain-specific ODID at each time point according to equation (1). OD values at the specific time points were averaged values from all three biological replicates (Figure S5). The ODID (0 h) for each peptide-expressing strain was set to 0.2 at the time of induction as lfcShrinkID (0 h) = 0 and OD = 0.2. This enabled us to compare peptide-expressing strains of different abundancies (see Figure S6). ODID values can be interpreted as the OD values that would have been measured when incubating the respective strain individually in the same experiment, i.e. in this case in LB medium in a 100 ml shake flasks.
To find Mex-active peptides, we also performed a one-sided Wald’s test, with the alternative hypothesis that the expression of a given peptide leads to a reduced ODID 1.5 h and 4.5 h post-induction. We rejected the null hypothesis at significance level alpha = 0.05. Peptides with a p<0.05 (after adjustment for multiple testing using the Benjamini-Hochberg method) after 4.5 h are considered Mex-active peptides. Peptides with p<0.05 after 1.5 h do significantly inhibit growth already after 1.5 h. All values and results are reported in File S1.
Monoseptic growth experiments
Taking the ODID (4.5 h) of each peptide-expressing strain, we could rank all peptides by their growth inhibitory effect. We selected 110 peptides (Ranks 1-50, 100-119, 1000-1019, and 10,000-10,019) and then generated an identical copy of the strain previously used in Mex for its expression. First, the corresponding peptide-encoding DNA-sequences were synthesized as gene fragments. An aliquot of 400 ng of each gene fragment was directly used for a restriction digest using enzymes HindIII-HF and Pst-HF in Cutsmart buffer. The product was purified using a DNA purification kit. Next, T4 ligase (800 units) was used to ligate 50 ng of identically digested pBAD vector and 10 ng of purified gene fragment in T4 ligase buffer for 14 h at 16°C. The ligation product was purified using a DNA purification kit. An aliquot of 5 µl of the purified ligation product was then used to transform chemically competent E. coli TOP10 cells. From the resulting colonies, we isolated one strain, sequence-verified the correct assembly of the expression plasmid, and stored it after overnight growth in glycerol at -80°C. For the growth experiment, we first re-isolated single colonies on solid media and then picked three clones, incubated them separately overnight, and inoculated them into 200 µl fresh LB medium containing 0.3% (w/v) l-arabinose to a final OD of 0.01 into 96-well microtiter plates (Greiner). Growth was recorded by measuring OD in a Tecan Infinite 200 PRO (Tecan, Männedorf, CH) for 4.5 h (37°C, 1.5 mm orbital shaking).
Enrichment analyses
We used Fisher’s exact test to assess the over- or underrepresentation of Mex-actives in various groups. This amounts to a hypergeometric test to assess the significance of drawing n active peptides in a group of k, from a population of size N containing K active peptides. We rejected the null hypothesis at significance level alpha = 0.05. Groups with a p<0.05 had a significantly different representation of active peptides compared with the overall population. When adjusting for multiple testing, we used the Benjamini-Hochberg method.
Peptide classifications
The physicochemical parameters of the peptides were calculated at pH 7 using the R package ‘Peptides’ (https://cran.r-project.org/package=Peptides). For charge, we used the method by Lehninger.[34] For hydrophobicity, we used the calculations by KyteDoolittle.[35] The information for each parent such as the name, chemical modification, activity, 3D-structure, was extracted from the APD website (https://aps.unmc.edu/) using an in-house R script. The information on the species from which a specific peptide sequence originated, was extracted from the tblastn search and the APD website. The entire taxonomic classifications (kingdom, phylum, class) for each species were extracted, if available, from the Global Biodiversity Information Facility Data Portal (https://gbif.org) using the R package ‘taxize’ (https://cran.r-project.org/package=taxize). The results are summarized in File S1.
Membrane damage assay assay using intracellularly synthesized peptides
We selected the peptide-expressing strains of rank 1-50 in Mex that we had previously constructed for the monoseptic growth assay. Additionally, we selected the strain expressing the inactive control peptide HNP-13425 APD, a peptide known to be inactive if expressed in E.coli.[8] Each strain was re-isolated on solid media from frozen stock and incubated overnight. Then, two colonies were picked and incubated overnight in 96-deep-well polypropylene plates. These cultures were used to inoculate fresh media containing 0.3% (w/v) l-arabinose to a final OD of 0.01 into 96-well microtiter plates. The plates were then incubated on for 4.5 h (37°C, 1.5 mm orbital shaking). After 4.5 h, an aliquot of 50 µl of cell suspension a Tecan Infinite 200 PRO plate reader was added to 150 µl of phosphate-buffered saline into a fresh 96-well microtiter plate. Propidium iodide (PI) was added to a final concentration of 1 µg ml−1. PI is a DNA-intercalating dye that cannot pass an intact cytoplasmic membrane.[36] For each sample, PI fluorescence (λEx = 579 nm / λEm = 616 nm) of ~10,000 cells were analyzed using a flow cytometer LSR Fortessa (BD Biosciences, Allschwil, CH). To determine the membrane damaging properties of each of the expressed peptides, we calculated the fraction of cells in percent for which a PI uptake was measured using the software FlowJo V10 (BD Biosciences).
Stress response assay using intracellularly synthesized peptides
We selected peptide-expressing strains of rank 1-50, previously generated for the monoseptic growth assay. Additionally, we selected the strain expressing the inactive control peptide HNP13425 APD. Moreover, two plasmids (cloning vector: puA66) containing either the promoter of the gene for recombinase A (PrecA) or for the gene for cold shock protein A (PcspA) were purified from the E. coli Alon collection.[15] Both plasmids contained a transcriptional fusion of their promoter with a downstream gene for green fluorescent protein (gfp), an additional kanamycin resistance cassette, and a pSC101 origin of replication. We transformed each of the 51 peptide-expressing E. coli strains with each of the two plasmids to generate 102 different strains and incubated them overnight on solid media. Then, three colonies were picked and incubated overnight. These cultures were used to inoculate fresh media containing 0. % (w/v) l-arabinose to a final OD of 0.05 into 96-well microtiter plates. We recorded OD and GFP expression (λEx 488 nm/λEm 530nm) after 1.5 h and 4.5 h using a Tecan Infinite 200 PRO (37°C, 1.5 mm orbital shaking). For each strain, we calculated the specific fluorescence change between the two time points (GFP/OD (4.5 h) - GFP/OD (1.5 h)). Statistical significance was calculated by one-sided t-tests, adjusted for multiple testing by Benjamini-Hochberg, using the signal of HNP-13425 APD as null distribution. We rejected the null hypothesis at significance level alpha = 0.05.
Purification of chemically synthesized peptides
Peptides were obtained from Pepscan (Lelystad, NL) in >90% purity or in crude format and subsequently purified to >90% purity in-house. For the latter, crude peptides were dissolved in 5 ml DMSO and 15 ml 0.1% aqueous trifluoroacetic acid, TFA. HPLC-purification of the dissolved crude peptides was performed on an ӒKTAexplorer chromatography system (GE Heathcare, SE). The entire peptide sample was loaded onto a RP C18 column (PRONTOSIL 120 C18 10 µm, 250 x 20 mm, 50 x 20 mm precolumn, Bischoff, Leonberg, DE), heated to 30°C and operated at a flow rate of 10 ml min1 using 0.1% aqueous TFA as solvent A and acetonitrile supplemented with 0.1% TFA as solvent B. The ratios of A to B were adapted for each peptide and typical values are given below. The column was equilibrated with the peptide-specific mixture of solvent A and solvent B 0-20%) prior to injection. After injection and an initial wash step of 6 min a gradient was imposed with the same mixture, and then a gradient was applied, in the course of which the amount of solvent B was increased to 50-0 % in 40 min. The column was washed with 5 % solvent B for 8 min and equilibrated with the specific solvent A/solvent B mixture for the next run for 13 min. Peptide elution was monitored spectrophotometrically at 205 nm, and generally the main peptide peak was collected. The sample was frozen at 80°C for >2 h and lyophilized (approx. 18 h) using a freeze-dryer (Alpha 2-4 LDplus, Christ, DE), connected to a vacuum pump (RC6, Vacuubrand, DE). The lyophilized peptides were dissolved in 1 ml DMSO and stored at -20°C. The concentration of the peptide stocks was determined via HPLC using an Agilent 1200 series HPLC system. Each peptide stock was analyzed as a 1:100 dilution in water. An aliquot of 10 µl of the peptide stock was injected onto an RP‐C18 column (ReproSil‐Pur Basic C18, 50 x 3 mm, Dr. Maisch, Germany) operated with water supplemented with .1 % TFA as solvent A and acetonitrile supplemented with .1 % TFA as solvent B. Separation was performed using the same concentration profile previously used for purification. The concentration was measured using the integrated peak area at 205 nm and then calculated using peptide-specific absorption properties.[37, 38]
Measurement of the MIC using chemically synthesized peptides
On the same day at which MIC assays were executed, purified peptides were thawed and the concentration was determined by HPLC as described before. E.coli TOP10 cells were grown in Mueller Hinton Broth (MHB) or diluted MHB (25% of the original strength) overnight to stationary phase. Diluted MHB has been frequently used to assay antimicrobial peptides.[39] The cultures were then supplemented with 20% glycerol, aliquoted, and frozen at 80°C. For MIC measurements, a frozen stock of the cells was thawed, resuspended in MHB or 25% MHB to adjust to a density of 5*105 cells ml− 1 in the experiment, and distributed to microtiter plate wells by an automated liquid handling system (Hamilton, Bonaduz, CH). Then the peptides were added by the liquid handling system in 2-fold dilutions using minimum of 100 µg ml− 1 as the highest concentration. MICs were determined as broth microdilution assay in 384-well flat bottom polypropylene plates (Falcon® 96-Well Flat-Bottom Microplate) adapted from the protocol of Wiegand et al.[40] The plates were sealed airtight and incubated for 18 h without shaking at 37°C before reading the OD using a Tecan Infinite 200 PRO plate reader. The MIC value corresponded to the concentration at which no growth of the bacterial strain was observed (< 5% of the OD value of the growth control). MIC experiments were performed at least in triplicate.
Membrane damage assay using chemically synthesized peptides
To measure extracellular membrane damage, E. coli ATCC 25922 [pSEVA271-GFP] and the peptide dilutions were prepared as described for the MIC measurements but covering a concentration range of 16 x MIC to MIC/16 in 2-fold dilutions steps with a final assay volume of 200 µl. The bacterial strain suspension was furthermore supplemented with a final concentration of 1 µg ml−1 propidium iodidejust before pipetting the assay. After 1 h incubation at room temperature membrane damage (=release of intracellularly expressed GFP and/or uptake of extracellularly added PI) was assessed by flow cytometry using a Fortessa Analyzer (BD Biosciences; 488 nm laser with 530/30 nm bandpass filter and 579 nm laser with 610/20 nm bandpass filter). The fractions of PI-positive and GFP-positive were determined with the same gate for all populations using the FlowJo V10 software (BD Biosciences). The extracellular membrane integrity assay was performed in biological duplicates analyzing at least 10,000 cells in each experiment.
Hemolysis assay using chemically synthesized peptides
Two samples of human blood were obtained from a blood bank (Blutspendezentrum SRK at the University Hospital Basel). The samples were pooled and erythrocytes were isolated by repeated centrifugation at 500 x g for 10 min, removal of the blood plasma and resuspending the remaining cells in an equal volume of DPBS. Following last resuspension, erythrocytes were diluted 1:50 in DPBS. For the hemolysis assay, a log2 serial dilution of each peptide was prepared as described for the MIC but using DPBS and a 96well plate (Ubottom, PP, 650201, Greiner) with a final volume of 200 µl. As lysis control, 2.5% TritonX100 in DPBS was used in well 10, well 11 served as non-treated control (no peptide added), and well 12 as blank. To each well of the dilution plate, 100 µl of the red blood cells suspension was added. The plate was incubated for 1 h at 37°C. After the incubation, the plate was centrifuged at 500 x g for 10 min and 100 µl of the supernatant was transferred to a clean 96-well plate (F-bottom, PS, 655101, Greiner). The absorbance was measured at 540 nm using an Infinite M1000 PRO plate reader (Tecan) and corrected by the measurements from the untreated wells. The lysis of each peptide concentration was expressed relative to the lysis control (set as 100% lysis). The hemolysis assay was performed in triplicate.