Methods
Results and discussion
Directed evolution of ZfQQR variants
Directed evolution of the ZfQQR zinc finger was attempted by creating a library of
variants using saturation mutagenesis by codon cassette insertion in
target regions. T7 phage display was used in order to obtain phenotype-to-genotype
linkage and selection of desired variants from the libraries was performed using biopanning.
For this purpose, a modified version of the zfqqr gene with unique restriction sites around the target regions was cloned into a T7
phage downstream of the 10B capsid protein. In order to confirm that the T7ZfQQR recombinant
phage expresses a functional zinc finger, biopanning was performed using a control
mixture of T7 phages with recombinant T7ZfQQR at the ratio 100:1. The control mixture
was incubated with immobilized biotinylated substrate that contains the ZfQQR recognition
sequence and an empty streptavidin-coated well. Already after the second round of
biopanning, the ratio of T7:T7ZfQQR was 5:8 and 95:1 from substrate-immobilized and
empty wells, respectively. The enrichment of the initial mixture with T7ZfQQR phages
confirmed the functionality of the expressed zinc finger and effectiveness of the
selection method.
Three libraries of genes encoding ZfQQR zinc finger variants, Zfm2, L5 and L6, were
constructed. The Zfm2 library was designed to select domains with altered sequence
specificity by randomization of the residues involved in direct interaction with the
substrate (Q56, S58, N59 and K62) in the second zinc finger module (Fig. 1B). The
L5 and L6 libraries were designed to enable selection of variants that are more selective
towards DNA-RNA hybrids with the target sequence by randomization of the sequence
of Zfm2 - Zfm3 linker. In library L5 (Fig.1B), the fragment coding five amino acid
residues (TGEKP) was randomized, whereas in the library L6 the randomized fragment
was extended to six residues (Fig. 1B). The rationale behind extension of the linker
was based on the fact that the structure of the DNA-RNA hybrid helix is an intermediate
between two forms- A with 11 base pairs per turn and B with 10.5 base pairs per turn.
It is slightly more packed in comparison to the B form of the dsDNA
[14]. A longer and flexible linker might enable the modules to wrap around the DNA-RNA
helix and fit better to the compressed structure than a shorter and more rigid one.
In all libraries, the selected codons were replaced by a degenerate NNS codon. After
ligation of the library cassettes to the T7ZfQQR construct and in vitro packaging 3,3 x 105 pfu/ml , 2.4 x 105 pfu/ml, 2.1 x 106 pfu/ml recombinant phages were obtained for libraries Zfm2, L5 and L6, respectively.
Selection of variants from the Zfm2 library was carried out in parallel on a set of
64 biotinylated DNA-RNA hybrid substrates, each having a different possible variation
of the three middle nucleotides in the recognition site (Fig. 1A, see Table S1 in
Additional file 1). Such approach was aimed at determining the recognition code for
binding DNA-RNA hybrids. Libraries L5 and L6 were selected using the original ZfQQR
binding sequence. Phage libraries were biopanned for five rounds, the phage titer
after each round varied from 105 to 107. The material after biopanning, input libraries and negative control (phage library
Zfm2 biopanned on surface without substrate) were sequenced using MiSeq Illumina sequencer.
On average, 67 thousand reads were obtained with the correct length and sequence flanking
the randomized regions for each sample. Distribution of the degenerate NNS sequence
in the input Zfm2 library was uneven. The predominant codons encoded mainly P, F,
L and V residues accounting for around 50% of reads, whereas the theoretical frequency
should be around 25% (see Table S1 in Additional file 2). The most frequently appearing
sequence encoded the motif PPPP and was present in 4.5% of all the filtered reads.
For the input libraries L5 and L6, no bias in the amino acid distribution was observed
(see Table S2 and S3 in Additional file 2).
In the case of variants derived after selection from library Zfm2 and from the negative
control, a very similar distribution of amino acids was observed irrespective of the
sequence of the substrate used for biopanning or the presence of the substrate during
the selection (see Table S1 in Additional file 3). All samples had a similar consensus
sequence FVLL (example in Fig.2A) where the consecutive letters of the motif correspond
to the residues in the native protein Q56, S58, N59 and K62. Distribution of the amino
acid residues in all sequenced samples resembled to a large extent the distribution
of the input library Zfm2 (see Table S1 in Additional file 3). The most prominent
change observed in the isolated variants was the lowering of frequency of the PPPP
motif. Most likely the selection pressure disfavored the presence of a conformationally
rigid residue, because of the
steric hindrance in the structure of the alpha helix in the zinc finger module
[15, 16]
</a>
. It is also possible that the presence of four proline residues affected the folding
of the zinc finger fusion with phage capsid and as a result these variants were eliminated
during consecutive biopannings.</p>
The above results may be caused by several factors. The starting material in library
Zfm2 was characterized by an uneven distribution potentially causing them to dominate
the less frequent ones by occupying the binding sites. Perhaps the selection pressure
was insufficient for the applied strategy of randomization of the middle zinc finger
module. The binding of the first and third module might have been strong enough to
withstand the selection. This is further supported by the fact that similar sequences
were isolated from libraries selected on almost all of the substrates. Additionally,
interaction with bases in nucleic acids was rarely observed for this group of amino
acid residues
[17]. Another possibility is that the lack of binding in the central module promotes binding
of the DNA-RNA hybrid structure by eliminating a steric hindrance that may arise from
its specific interaction.
Sequencing of the variants derived from the selection of library L5 revealed that
the predominant isolated amino acid sequence was TRERN (17% of obtained sequences,
see Fig. 2B). For library L6 the sequence NQMMRK (9% of obtained sequences, see Fig.
2C) was most frequently observed. None of the above two amino acid sequences appeared
in the results from sequencing of the input libraries, which means that they were
present less frequently than 1 in 55162 for the L5 library and 1 in 42323 for the
L6 library. What is interesting that in case of the library L5 the sequence NQMRP,
partially resembling the one isolated from the L6 library was the fourth most frequently
appearing (Fig. 2B).
Binding affinity and selectivity of the isolated variants
The binding affinity of zinc finger variants selected using directed evolution was
determined. For the Zfm2 library, the consensus sequence was chosen and variants of
the zinc finger containing the Q56F S58V N59L K62L substitutions (termed ZfFVLL) only
in the Zfm2 and in both, Zfm2 and Zfm3 (additional substitutions Q28F S30V N31L K34L,
termed Zf2xFVLL) were obtained. The most frequently observed amino acid motifs obtained
for the libraries L5 and L6 were introduced to the Zfm2-Zfm3 linker (termed ZfTRERN
and ZfNQMMRK, respectively) and, additionally to the Zfm1-Zfm2 linker (termed Zf2xTRERN
and Zf2xNQMMRK, respectively).
For native ZfQQR and each protein variant, the equilibrium dissociation constant constant
was measured using surface plasmon resonance (Fig. 3A). The KD for the ZfFVLL and Zf2xFVLL zinc fingers was above 5000 nM and could not be measured
using this method because the proteins were aggregating in the assay buffer at concentrations
above 2 µM. This result of binding analysis and the sequencing results obtained from
selection using a panel of 64 substrates, as well as the negative control, support
the explanation that the input library bias along with insufficient selection pressure
hampered the biopanning. It is most likely that the selected variants result from
the background nonspecific binding of phage particles to the streptavidin-coated wells.
The KD of the ZfTRERN and ZfNQMMRK variants was slightly higher than the ZfQQR (Fig. 3B).
However, when the motifs were repeated in the Zfm1-Zfm2 linker, the variants had 10-fold
and 40-fold higher KD than the single motif variants. This result indicates that the engineering of linkers
is localization specific and their optimization should be performed separately for
each one.
In order to determine if the zinc finger variants were improved in their ability to
discriminate between the DNA-RNA hybrids over dsDNA their relative binding to the
substrate with the 5′GGGGAAGAA3′ sequence in the presence of 100-fold excess of a
dsDNA competitor (containing the 5′GGGGAAGAA3′ sequence) was measured using nitrocellulose
filter binding assay. All single motif and double motif variants displayed at least
2-fold higher relative binding of the DNA-RNA hybrid than the original ZfQQR (Fig.
3C). Although the variants display a lower KD than the ZfQQR, their selectivity for DNA-RNA hybrids over dsDNA improved. It might
indicate that further optimization of preference for DNA-RNA hybrid vs. dsDNA binding
is achievable and that it is distinct from optimization of the sequence selectivity.
Limitations
Sequence bias in the Zfm2 input library resulted in overrepresentation of the P, F,
L and V codons. The number of phage particles obtained after in vitro packaging was insufficient to represent all the possible codon combinations in the
theoretical library. Affinity binding measurements using surface plasmon resonance
were done as single experiments.