Unidirectional single-file transport of full-length proteins through a nanopore

The electrical current blockade of a peptide or protein threading through a nanopore can be used as a fingerprint of the molecule in biosensor applications. However, threading of full-length proteins has only been achieved using enzymatic unfolding and translocation. Here we describe an enzyme-free approach for unidirectional, slow transport of full-length proteins through nanopores. We show that the combination of a chemically resistant biological nanopore, α-hemolysin (narrowest part is ~1.4 nm in diameter), and a high concentration guanidinium chloride buffer enables unidirectional, single-file protein transport propelled by an electroosmotic effect. We show that the mean protein translocation velocity depends linearly on the applied voltage and translocation times depend linearly on length, resembling the translocation dynamics of ssDNA. Using a supervised machine-learning classifier, we demonstrate that single-translocation events contain sufficient information to distinguish their threading orientation and identity with accuracies larger than 90%. Capture rates of protein are increased substantially when either a genetically encoded charged peptide tail or a DNA tag is added to a protein. Full-length, unfolded proteins are slowly translocated through nanopores without enzymes and fingerprinted.

The electrical current blockade of a peptide or protein threading through a nanopore can be used as a fingerprint of the molecule in biosensor applications.However, threading of full-length proteins has only been achieved using enzymatic unfolding and translocation.Here we describe an enzyme-free approach for unidirectional, slow transport of full-length proteins through nanopores.We show that the combination of a chemically resistant biological nanopore, α-hemolysin (narrowest part is ~1.4 nm in diameter), and a high concentration guanidinium chloride buffer enables unidirectional, single-file protein transport propelled by an electroosmotic effect.We show that the mean protein translocation velocity depends linearly on the applied voltage and translocation times depend linearly on length, resembling the translocation dynamics of ssDNA.Using a supervised machine-learning classifier, we demonstrate that single-translocation events contain sufficient information to distinguish their threading orientation and identity with accuracies larger than 90%.Capture rates of protein are increased substantially when either a genetically encoded charged peptide tail or a DNA tag is added to a protein.
High-throughput and long-read genomic sequencing 1 methods include several single-molecule techniques 2 in which the nucleotide sequence of individual DNA molecules is determined either by monitoring DNA replication in real time 3 or by passing a DNA strand through a nanopore detector 4,5 .But molecular characterization of the >10 4 proteins in the canonical human exome, with a multitude of protein isoforms 6,7 and post-translational modifications (PTMs) 8,9 , requires new quantitative methods for protein counting, sequencing and discrimination among various isoforms and PTMs 10 .Mass spectrometry (MS) is currently the gold standard method for these characterizations, which commonly require protein fragmentation, quantification and sequence reconstruction.The high sensitivity of MS to minute protein quantities permits single-cell proteomics 11 .However, low peptide ionization rates and other limitations result in only a few percent sampling efficiencies 12,13 .Alternative approaches are needed to deliver a complete single-cell proteome without extensive fragmentation.
Nanopores have emerged as a key component of new proteomics tools 14 .The methodology of nanopore DNA sequencing has been adapted to sense the amino acid composition of model peptides and proteins [14][15][16][17][18][19] .However, biological proteins are much more complex than DNA, given their diverse secondary structures, strong intramolecular interactions and heterogeneous distribution of the electrical charge along their polypeptide chains.Voltage [20][21][22][23] , temperature 24 and chemical denaturation 16,[25][26][27][28][29][30] have been used to enhance the access of Article https://doi.org/10.1038/s41587-022-01598-3deterministic translocation process mediated by the insertion of the D 10 tail, which is in agreement with earlier reports where charged tags, such as DNA oligomers 22 or poly-aspartate tails 32 , were used to direct protein capture.However, modifying a full-length protein without any prior genetic engineering is crucial for a path to analysis of native proteins.We therefore demonstrate that DNA conjugation to a full-length protein tail is possible, albeit at low yields (~5% to 6%), by labeling WT-MBP with a dT 20 DNA oligo (Supplementary Fig. 1 and Supplementary Note 1).We confirmed that the addition of the dT 20 tail enhances threading of MBP through the pore, as indicated by an enriched cluster of events with similar dwell times and current blockades as observed for MBP-D10 (compare Fig. 1e with Supplementary Fig. 2).

Influence of GdmCl concentration on complete protein unfolding
Bulk measurements suggest that MBP has an unfolding midpoint at 1.0 M GdmCl at room temperature and that MBP fully denatures when GdmCl concentration exceeds ~1.2 M 41,42 .After adding MBP-D10 to the cis chamber, in Fig. 2a, the current blockade versus dwell time scatter plots (for 1.0 M, 1.5 M and 2.0 M GdmCl, respectively) show two main distributions, highlighted by dashed red and blue circles, in addition to a very 'fast' population at ~100 µs, which most likely corresponds to protein collisions with the pore entrance.Consistent with the scatter plots, the current traces in Supplementary Fig. 4 show two types of events (long and short).We assign the long-lived events (red dashed circle) to population P F , which encompass events from partially folded proteins.For the reasons described in the next paragraph, we ascribe the tight, shorter-lived population P L (blue dashed circle) to events produced by linearized, completely unfolded proteins.Since these two populations are generally well-resolved, we have determined the percentage of linear protein events P L as a function of GdmCl concentration (Supplementary Fig. 17).As shown in Fig. 2a, P L increases with the GdmCl concentration from ~36% at 1.0 M GdmCl to >93% at 2.0 M GdmCl.The existence of one population at 2.0 M GdmCl suggests that the protein is fully unfolded during its translocation through the pore.

Evidence of steady voltage-driven protein translocations
Figure 2b plots the mean dwell time of the P L population as a function of voltage for experiments conducted using an MBP monomer (MBP-D10) or a dimer (diMBP-D10) (see Supplementary Fig. 11 for the dwell time histograms).The average dwell time of population P L decreases as the voltage increases.The dwell time for the MBP dimer (diMBP-D10) is a factor of 2 larger than that for the MBP monomer (MBP-D10).In contrast, the dwell time of events attributed to partially folded proteins, P F , is much larger for both MBP-D10 and diMBP-10 than for the unfolded events at all voltages and exhibits a steeper voltage dependence than the P L population (Supplementary Fig. 12).The plot of the protein mean 'velocity' (Fig. 2c) calculated by dividing the protein contour length (0.34 nm per amino acid) by the dwell time (from Fig. 2b) shows a linear dependence on voltage for both monomeric and dimeric MBP, thus indicating that the velocity does not depend on protein length.The electrophoretic mobility of a protein is described by where d is the protein's contour length and d/t = V e is the protein translocation velocity 43 .Estimating the electric field E as the ratio of the voltage to the length of the α-hemolysin lumen, D = 5 nm, we obtain where v e /V is the slope of the fits from Fig. 2c.Using Eq. 2, we find the electrophoretic mobility of MBP-D10 and diMBP-D10 at unfolded proteins to the nanopore constrictions.However, protein unfolding does not guarantee protein translocation because an unmodified protein cannot be electrophoretically driven through a nanopore in the same manner that a charged DNA in an electric field produces a steady electromotive force 31 .Enzyme-assisted unfolding and translocation of large proteins have been demonstrated using ClpXP as a motor to linearize and pull the protein through α-hemolysin 32,33 .Recent works using phi29 DNA polymerase 34 or Hel308 DNA helicase 35,36 have shown enzyme-mediated ratcheting or unwinding of peptide-DNA conjugates through the MspA nanopore.However, to date, enzyme-free readout of a full-length protein using a nanopore has not been achieved.
Here we demonstrate an enzyme-free platform for single-file unidirectional transport of full-length proteins through a nanopore reader.Electroosmotic flow, enhanced by the use of guanidinium chloride (GdmCl) as a denaturant, drives protein transport through the nanopore, conferring uniform and slow (~10 µs per amino acid) single-file protein transport.The ionic current signals produced by the transport are found to carry the information about the protein sequencing, which we demonstrate by matching the signals produced by the N-terminus and C-terminus transport of the same protein, the transport of a double concatemer of the same protein, and by determining the composition of a binary protein mixture.With further development, our approach paves the way for single-molecule protein identification and quantification of the protein isoforms.

Experimental setup for unfolding and transporting full-length protein through a nanopore reader
Our experimental setup (Fig. 1a) comprises a wedge-on-pillar (WOP) membrane support 37 , a poly(1,2-butadiene)-b-poly(ethylene oxide) (PBD n -PEO m ) block-copolymer bilayer that spans the aperture, and an inserted wild-type (WT) α-hemolysin channel, which is the nanopore used in our experiments.We chose this block-copolymer membrane for its chemical compatibility with GdmCl buffers and high voltage tolerance (>350 mV for 100-µm diameter bilayer membranes) when combined with the WOP support 38 .Also depicted in the figure is our use of high-concentration GdmCl buffer, critical for protein unfolding.Figure 1b shows current versus voltage curves for single α-hemolysin channels at different buffer conditions.All curves exhibit significant asymmetry, with higher current amplitudes at positive voltages.The impact of GdmCl on noise in α-hemolysin is moderate as follows: the 10 kHz bandwidth noise at 300 mV is 7.2 pA for 2.5 M KCl and 10.7 pA for 1 M KCl + 2.0 M GdmCl.Power spectra at 0 and 300 mV for these two buffers (see inset) indicate a slight increase in the noise in the low-to-intermediate frequency regime (<5 kHz).
Using the above experimental setup, we conducted nanopore translocation experiments on variants of the α-helix-rich maltose-binding protein (MBP) in its monomeric form (denoted as either MBP-D10 or D10-MBP, depending on the C-or N-terminus attachment of the D 10 tail) and its dimeric form (diMBP-D10, with a GGSG linker between two MBP monomers).The nanopore translocation experiments were repeated using green-fluorescent protein (GFP), stable β-barrel structure which makes this protein notoriously challenging to unfold.Figure 1c shows the net charge (at pH 7.5) 39 and length (number of amino acids) of each protein, and pKa-based graphical profiles of the charge 39 and relative volume 40 of each amino acid residue.
The impact of a charged tail on protein threading is indicated in Fig. 1d,e, where current traces of WT-MBP and MBP-D10 are shown.While in both experiments, protein concentration was 350 nM, the capture rates for MBP-D10 (9.3 s −1 µM −1 ) were ~80% higher than for WT-MBP (5.2 s −1 µM −1 ).Further, WT-MBP events had a broad range of amplitudes and short durations (<1 ms), whereas MBP-D10 had nearly 79% of the events forming a tight distribution characterized by a ~85% fractional blockade and dwell times between 1 ms and 10 ms.Such reproducible dwell times and current amplitudes suggest a  2c), indicating faster translocating speeds.The uniform translocation speed and its linear relationship with voltage resemble ssDNA translocation through α-hemolysin [44][45][46] .Finally, a plot of the event rates (Supplementary Fig. 18) reveals a low-voltage regime characterized by short-lived high-frequency collisions and an exponentially increasing capture rate at higher voltages (V > 150 mV), which suggests an entropic barrier for capture 47 .
In Fig. 2d, we present dwell time distributions for GFP-D10 (254 amino acids), MBP-D10 (389 amino acids), D10-MBP (389 amino acids) and diMBP-D10 (764 amino acids) obtained at 175 mV applied voltage and 2 M GdmCl denaturant concentrations.We found no dependence of the protein transport time on its orientation of entry (MBP-D10 versus D10-MBP), which differs from the orientation dependence of DNA transport through α-hemolysin 48 .This is supported by Fig. 2e, which shows current blockade versus dwell time scatters for MBP-D10 and D10-MBP.To a large extent, there is an overlap in the dwell time distributions, except that D10-MBP is captured less effectively, resulting in a greater fraction of collisions that appear as short-lived pulses (~100 µs) with lower current blockades.However, it is noteworthy that protein translocation from either direction proceeds with the same speed.Drift velocities for all molecules in this study are in the range of 0.031-0.04nm µs −1 .The extended backbone distance between amino acids in a protein chain (0.34 nm) 49 translates to a mean residence time of ~10 µs per amino acid in the pore.This average translocation velocity

Electroosmotic flow drives protein transport
To determine how Gdm + ions enable unidirectional transport of unfolded peptides, we built seven all-atom simulation systems, each containing a different 52-residue fragment of the MBP protein (Supplementary Table 1) threaded through α-hemolysin, a lipid membrane and 1.5 M GdmCl/1 M KCl electrolyte (Fig. 3a).For comparison, two additional variants of each system were built, differing by the electrolyte solution composition (1.5 M GdmCl and 2.5 M KCl).Each design was equilibrated using the all-atom MD method 50 and then simulated under a + 200 mV bias for approximately 1,500 ns (see Methods for details).
In the case of the GdmCl/KCl electrolyte, 94% of the blockade current was carried by Cl − ions, whereas the current carried by Gdm + and K + ions was 6% and 0%, respectively (Fig. 3b).Similarly, strong ionic selectivity was observed for pure GdmCl electrolyte (Fig. 3c and Supplementary Fig. 20).The ionic selectivity was less pronounced but still substantial (70% and 30% for Cl − and K + currents, respectively) for pure KCl.Consistent with the ion selectivity, we observed strong electroosmotic effects in all three systems (Fig. 3d,e and Supplementary Fig. 21).Further analysis found Gdm + ions to accumulate at the inner nanopore surface, particularly near the termini of the α-hemolysin stem (Fig. 3f, top, and Supplementary Fig. 22).In the same regions, individual Gdm + ions were observed to remain bound to the nanopore surface for considerable (>10 ns) intervals of time, (Fig. 3f, bottom).
In all systems, the local concentrations of ionic species were found to satisfy the local electroneutrality condition (Supplementary Fig. 23).However, in the GdmCl/KCl system, K + ions were almost excluded from the α-hemolysin stem (Supplementary Fig. 23), which explains their negligible current.Thus, binding of Gdm + ions to the inner nanopore surface renders the surface positively charged.That surface charge is compensated by much more mobile chloride ions that carry most of the ionic current and produce a strong electroosmotic effect.
The electroosmotic effect produced by Gdm + binding produced a small yet measurable net transport of the unfolded protein fragments through the nanopore.To show that, we computed the number of residues translocated through the nanopore constriction as a function of simulation time (Fig. 3g and Supplementary Fig. 24).To exclude the effect of peptide chain shrinking or stretching, we identified the parts of the simulation trajectories where the number of peptide residues within the α-hemolysin stem remained approximately constant (Supplementary Fig. 25).Averaged over such constant peptide-density trajectory fragments, the peptides were found to move with the average rate of 1.0 ± 0.8 and 0.8 ± 0.5 residues per µs for pure and mixed GdmCl electrolytes, respectively, and 0.1 ± 0.4 residues per µs for pure KCl (Fig. 3h).

Protein-specific current signals
To extract an 'average shape' of ionic current signals produced by the translocation of C-tagged MBP (MBP-D10), N-tagged MBP (D10-MBP) and C-tagged MBP dimer (diMBP-D10), their barycenters (Fréchet means) were computed using the Soft Dynamic Time Warping (SDTW) metric 51 .The result of the barycenter computation is a smooth curve,   representing the centroid, or the 'essence' of the translocation event shapes in the dataset.A barycenter for each of the variant datasets is shown as a solid curve in Fig. 4a, superimposed on resampled events shown in the background (semitransparent, black).The event selection criteria (dwell time and current maximum ranges), the number of passing events and computation parameters are listed in Supplementary Table 4.We screened events with a narrow range of dwell times near the mean of each variant's distribution (3-5 ms for MBP-D10 and D10-MBP and 6-9 ms for diMBP-D10), which later allowed the SDTW algorithm to compute a clear and distinct signature shape for each variant.The current traces of the selected events were then resampled via interpolation to a segment count proportional to the average duration (300 points for MBP-D10 and D10-MBP and 500 points for diMBP-D10).Resampling reduces the effect of dwell time variation on this SDTW computation.The resulting barycenter curves show a trend of how the events of each protein type tend to progress, on average.Looking at the positioning of the local maxima and minima (pink and blue arrows, respectively), as well as the slope in the middle of the barycenter, we note that MBP-D10 and D10-MBP events show matching opposite trends (Fig. 4a, left and middle panel, respectively).Whether due to the proteins remaining secondary structure in the pore or purely due to sequence variation, the opposing current blockage trends are a strong indication of directional protein translocation.Further, the skewed 'W' shape of diMBP-D10 barycenter and the position of its local minima and maxima appear similar to the MBP-D10 barycenter but repeated twice, especially the 'bumps' in the beginning and the middle of the barycenter (Fig. 4a, right panel).
To investigate whether the signal properties from single events comprise sufficient information to discriminate among different protein variants and protein classes, we employed supervised machine learning (ML).A gradient boosting classifier (GBC) was trained and tested for discrimination among MBP variants (MBP-D10, D10-MBP and diMBP-D10) and distinguishing MBP-D10 from GFP-D10 in a binary mixture.Both GBC models were generated and evaluated with features extracted from labeled translocation events recorded one protein type at time, where 80% of the data were used to train the model and the remaining 20% were withheld for testing the model's accuracy (Supplementary Note 4).While on a population-wide analysis, dwell time alone could inform the ratio of protein types in a binary mixture (Supplementary Fig. 26  translocation events (dwell time ranging between 300 µs and 20 ms) were used to train and test both GBC models.Specifically, each event was divided into ten segments of equal length and seven statistical parameters from every segment were extracted, creating a feature space comprised of 70 dimensions for GBC model input (Supplementary Fig. 27 and Supplementary Note 4).
As confusion matrices in Fig. 4b and Fig. 5e show, mean classification accuracies of 80.8% and 89.6% were achieved with GBC models trained for three-way classification of MBP variants and two-way classification of MBP-D10 and GFP-D10, respectively.Each confusion matrix is an average of nine reshuffled combinations of samples allocated into the training set and testing set (Supplementary Note 4 and Supplementary Figs.28 and 29).We investigated the relative importance of each feature for GBC prediction in each case (Fig. 4c and Fig. 5f).We found the third quartile value in the eighth and ninth event segments to have the highest predictive power for discrimination of MBP variants.This suggests that there is a distinguishable blockade current near the end of MBP-D10, D10-MBP and diMBP-D10 translocation.In contrast, the current standard deviation of the second, third and fourth event segments had the greatest influence on GBC classification of MBP-D10 and GFP-D10.To support this result, we observed the greatest difference in local volume standard deviation to occur around these specific segments (Supplementary Figs. 30 and 31).
To validate the discrimination capability of the trained model, we tested its performance on unlabeled mixture experiments.
We note that simply deploying the model on a 50:50 mixture would not provide any useful information.Instead, we applied the trained GBC model on unlabeled events from mixture experiments containing different MBP-D10:GFP-D10 molar ratios and expected the classification ratios to follow the same trend.Figure 5a-c shows experimental traces after GBC classification for experiments with 20:80 (Fig. 5a), 50:50 (Fig. 5b) and 80:20 (Fig. 5c) MBP-D10 (red) to GFP-D10 (green) mixture, with the model's confidence score for each call labeled below the trace (Supplementary Note 4 and Supplementary Tables 6 and 7).Plotting the ratio of MBP-D10 to GFP-D10 predicted by the model versus the true concentration of MBP-D10 (Fig. 5d), we observe ~10% error by the model at an actual MBP-D10 ratio of 0%, followed by a trend where the number of MBP-D10 events called by the model increases roughly linearly with the actual concentration of MBP-D10 (as expected) and saturation at around 90% when the actual ratio of MBP-D10 is 100%.Model calls where MBP-D10 is higher than expected (20%, 40% and 50%) could be attributed to the higher capture rate observed with MBP-D10 in comparison to GFP-D10 (Supplementary Fig. 18).Example events from the training sets and mixture classification results are provided in Supplementary Fig. 32.The total event ratio predicted by the GBC model for 50:50 mixture experiment is in excellent agreement with the respective population size obtained by integrating the mixture experiment's dwell time histogram after fitting with the 1D drift-diffusion model [52][53][54][55] (Supplementary Note 2 and Supplementary Fig. 19).

Discussion
The commercial success and maturity of nanopore-based nucleic acid sequencing, with its ability to directly read native strands at full length, sparked a sizable interest in developing nanopore-based proteomics.If realized, the single-molecule aspect of nanopore-based proteomics could offer solutions to shortcomings of mass spectrometry and Edman degradation-based methods, such as quantification of protein isoforms and detection of PTMs 14 .The goal of our study was to identify and evaluate an enzyme-free method to translocate full-length proteins in a linearized state through nanopores, with downstream fingerprinting, isoform detection and ultimately protein sequencing applications in mind.We showed that our protein sensing platform which consists of a WOP aperture for membrane support 37 , a synthetic polymer bilayer membrane 38 with a single insertion of a WT α-hemolysin nanopore can withstand and function under denaturant (GdmCl) concentrations that are high enough to unfold analyte proteins.We further showed that capturing and initiating the threading of the unfolded proteins through the nanopore sensor required the addition of a charged tail, which  6 for sample size of events parsed for each ratio experiment.e, Two-way confusion matrix representing the mean classification accuracy of a GBC model trained to discriminate MBP-D10 and GFP-D10 (mean ± s.d. of nine GBC models, each trained and tested on 80% of the data from a reshuffled dataframe, shown in Supplementary Fig. 28).f, A heatmap showing the relative importance of each feature used to generate the multiclass GBC model for discrimination of MBP-D10 and GFP-D10 (all 70 feature importance values sum to 100, shown percentages are rounded to the nearest integers).Each column of the heatmap represents a segment index of the event, and each row represents a statistical parameter extracted from every segment (refer to Supplementary Fig. 27 for further details on features).All experiments were performed in 1.0 M KCl, 2.0 M GdmCl, 10 mM Tris, pH 7.5 and under a 175 mV bias applied to the trans chamber.The combined concentration of MBP-D10 and GFP-D10 for every ratio experiment is 0.70 µM.

Article
https://doi.org/10.1038/s41587-022-01598-3forces the translocation to start from the tagged terminus.However, both our experimental results and molecular dynamics simulations showed that translocation progression of the protein after threading initiation no longer depended on the charged tail.Instead, the significant voltage-driven electroosmotic flow generated by the presence of Gdm + ions within the pore lumen applies the stretching/driving force to complete the protein translocation.Lastly, we proved that the current blockade signals from protein translocation events contained distinct information which a machine-learning model could use to classify single protein molecules.
In our model experiments, we showed that protein threading is improved enormously by genetically engineering a charged D 10 tail on our proteins.However, our platform can serve as an analytical tool only if it can measure WT proteins present in living organisms, which requires a general method to attach a charged tail to the N-or C-terminal of proteins without prior genetic engineering.In this work, we tagged a dT 20 DNA oligo to the N-terminal of MBP and proved that it enhances protein capturing.Despite facing a low labeling efficiency (~5-6%) now, the potential of DNA-protein terminal conjugation methodology is highly important for native protein analysis, and future work is needed to optimize it or develop more efficient polyionic tags and optimized end-tagging chemistry.
For the protein molecules studied here, we find a linear relationship between the average protein velocity during translocation and voltage, which does not depend on either protein length or orientation of entry (MBP-D10 versus D10-MBP).The smooth motion of proteins through the nanopore is driven by electroosmotic force enhanced by the binding of Gdm + ions to the lumen of the α-hemolysin, as elucidated by MD simulations.The mean transport speeds are ~10 µs per amino acid, much slower than DNA transport, perhaps slow enough to collect several current datapoints for each protein segment while at the pore constriction.With improved measurement time resolutions that could be achieved by reducing the noise of our apparatus, enzyme-free protein 'scanning' at high measurement bandwidths may prove useful for identifying proteins and resolving variants.Further, as seen from the haziness of the overlaid resampled events in the background traces of Fig. 4a, transport of a given molecule is likely to have some velocity variations along the translocation coordinate.As with nanopore-based DNA and RNA sequencing, it may be possible to eliminate the effect of the nonconstant velocity in protein translocation using a neural network analysis pipeline.However, the highest translocation velocities must not exceed the bandwidth capacity of the nanopore instrument to avoid signal loss.
In an ideal nanopore-based protein sequencing or fingerprinting tool, the current blockade signals would contain the protein's sequence or subunit information.Here we pointed out that both resolution and bandwidth limitations in our experimental setup would limit the degree of access to that information.However, we observed that general trends in the shape of the blockade signal from protein variants are distinct and reproducible and used the blockade signals within events in a dynamic time-warping analysis to illustrate the unidirectionality of protein translocation, achieved by the choice of where to place the D 10 terminal tag.We then used features calculated from the blockade signals as the input to our GBC model to show that the embedded information in the signal, albeit convoluted, could result in classification accuracies of 80.8% among the three MBP variants and 89.6% between MBP-D10 and GFP-D10.Deploying the trained GBC model on binary mixtures of MBP-D10 and GFP-D10 at different molar ratios showed that the classification results are quantitative and in agreement with the relative molar ratios of proteins in the experiment.Based on these results, we speculate that our platform-in its current form and sensing resolution-may be capable of quantitative detection of protein isoforms, provided that a sufficiently large set of molecular standards exists which could be used for signal training.However, acquiring such data demands not only pore multiplexing and higher throughput but also efficient terminal tagging chemistry for full-length native proteins.We noted that GdmCl both facilitates protein unfolding and generates a stretching/driving force through electroosmotic flow.This electroosmotic flow that can drive an unstructured protein chain through a pore is important not only in the enzyme-free approach we presented here but also in keeping a protein chain taut at the pore during enzyme-mediated protein sequencing.As apparent in the data by Nivala et al. (refs.32,33), unfoldase-mediated protein translocation (pull-through) is susceptible to the jamming of the trailing folded or unfolded parts of the protein into the pore lumen, which modulates the current and corrupts the 'favored' blockade signal generated by the protein regions residing in the pore constriction site.We speculate that the addition of GdmCl to the cis-side of the pore in such unfoldase-mediated nanopore systems could initiate the electroosmotic flow and help prevent this jamming.For this to work, it may be required that the electrolyte concentrations and/or applied voltage are adjusted to generate an electroosmotic flow opposite to the pulling force of the unfoldase, which will ensure that the peptide chain in the pore is stretched.Further, unlike large detergent-like micellar denaturants such as SDS, Gdm + ions are small, and we do not anticipate them to heavily coat the interacting protein side chains to degrade the readout signal quality.We finally note here that working the highly denaturing conditions of GdmCl are compatible not only with α-hemolysin but also MspA, which offers a potentially higher resolution while maintaining its structure and function in 2 M GdmCl 56 .Other improvements such as changing the electrolyte type 57 , pore variant [58][59][60] and voltage waveform 61 can be made to further enhance the ability of our method to obtain unique fingerprint signatures or sequence from full-length proteins, which would usher in a new era in single-molecule proteomics. https://doi.org/10.1038/s41587-022-01598-3

Polymer bilayer painting and nanopore measurement
The 100 µm SU-8 wedge-on-pillar aperture supported by a 500 µm-thick Si chip with a square open window 37 was mounted on our custom-designed fluidic cell, sealing properly to separate cis and trans chambers.Both sides of the aperture were pretreated with 4 mg ml −1 poly(1,2-butadiene)-b-poly(ethylene oxide) (PBD 11 -PEO 8 ) block-copolymer (Polymer Source) dissolved in hexane to coat the aperture with a dry and thin polymer layer.The cis and trans chambers were filled with GdmCl electrolyte (all contain 1 M KCl, 10 mM Tris, pH 7.5), and a pair of Ag/AgCl electrodes were immersed in the electrolyte and connected to an Axon 200B patch-clamp amplifier.The polymer membrane was painted across the aperture using 8 mg ml −1 polymer dissolved in decane.At least a 60-min waiting time was required until the polymer membrane thinned to a capacitance value of 60-80 pF.After verification of bilayer formation, 0.5 µl of 50 µg ml −1 α-hemolysin (Sigma-Aldrich) was added to the cis chamber and an ion conductance jump marked single pore insertion.A denatured protein sample (incubated in GdmCl buffer before use) was added to the cis chamber and mixed gently by pipetting.Current signals were low-pass filtered at 100 kHz using the Axopatch setting and digitized at 16 bits and 250-kHz sampling rate using a National Instruments Data Acquisition card and custom LabVIEW-based software that records and saves all raw current data and acquisition settings.In further data analysis, digital low-pass filtering was used to further reduce the noise.

Cloning of the GFP and MBP constructs
All primers (Eurofins MWG Operon) used in this study are listed in Supplementary Table 1.The N-terminal 10-aspartate MBP (D10-MBP), C-terminal 10-aspartate MBP (MBP-D10) and C-terminal 10-aspartate GFP (GFP-D10) constructs were obtained by mutagenesis polymerase chain reaction (PCR) using pT7-MBP or pRSETB-GFP as the template plasmid.The PCR reaction mixtures were subjected to DpnI digestion for 3 h at 37 °C to degrade the template plasmids.The digested samples were then transformed into chemically competent Escherichia coli DH5α cells.The desired variant plasmids were isolated from colonies and verified by DNA sequencing.
The C-terminal MBP-D10 dimer construct (diMBP-D10) was generated as follows: the first mutagenesis PCR was performed using pT7-hisMBP as the template to remove the stop codon and add a flexible linker GGSG to the C-terminus of the MBP gene.The PCR products were digested with DpnI and transformed into E. coli DH5α cells resulted in a plasmid pT7-hisMBPggsg containing the hindIII and Sfbl restriction sites right after the GGSG linker gene.The second PCR was performed with pT7-MBP as the template to introduce HindIII and Sfbl cutting sites at the two ends of the MBP gene and add a D10 at the c-terminal to the MBP fragment.The PCR products and the plasmid pT7-hisMBPgsgg were digested with HindIII and Sfbl and ligated by T4 ligase.The ligated products were transformed into chemically competent E. coli DH5α cells.The variant plasmid pT7-diMBP-D10 was verified by enzyme digestion and DNA sequencing.

Expression and purification of GFP and MBP proteins
GFP and MBP protein variants (Supplementary Table 2) were expressed and purified using similar protocols.Briefly, plasmids were transformed into chemically competent BL21(DE3) E. coli cells.The cells were grown in 1 l of LB medium at 37 °C until the OD600 reached 0.6 and induced with 0.5 mM isopropyl β-D-1-thiogalactopyranoside.The temperature was then decreased to 16 °C for overnight expression.Cells were collected by centrifugation at 13,000 RPM for 25 min.The cell pellets were used for protein purification or frozen at −20 °C for future use.Cells were resuspended in 50 ml of 50 mM Tris-HCl (pH 8.0), 150 mM NaCl buffer and lysed via sonication to purify proteins.The lysate was centrifuged at 13,000 RPM for 25 min.The supernatant was filtered through a 0.22 µm syringe filter (CELLTREAT Scientific Products) and then loaded to a Ni-NTA affinity column (Thermo Fisher Scientific) equilibrated with buffer 50 mM Tris-HCl (pH 8.0), 150 mM NaCl.MBP-D10, D10-MBP and diMBP-D10 were eluted in buffer 50 mM Tris-HCl (pH 8.0), 150 mM NaCl and 150 mM imidazole.GFP-D10 was eluted in buffer 50 mM Tris-HCl (pH 8.0), 150 mM NaCl and 20 mM imidazole.After Ni-NTA chromatography, MBP-D10, D10-MBP and GFP-D10 exhibited more than 95% purity on SDS-PAGE, while the eluted diMBP-D10 fraction contained multiple low-molecular impurity bands.The eluted samples were run on a preparative 12% SDS-PAGE to remove these impurity proteins.The band containing the full-length diMBP-D10 was cut out, and the protein was extracted from the gel with buffer 50 mM Tris-HCl (pH8.0), 8 M urea by incubating the gel and the extraction buffer at room temperature overnight.The supernatant containing the protein was collected by centrifuging the samples at 13,000 RPM for 30 min.Protein concentrations of all samples were determined by A280 with Nanodrop and stored at −80 °C for future use.

MD simulation
All MD simulations were performed using the molecular dynamics program NAMD2 (ref.64), a 2-fs integration timestep, periodic boundary conditions, CHARMM36 (ref.65) force field, and custom nonbonded fix corrections for K, Cl and Gdm ions 66 .The SETTLE algorithm 67 was used to maintain covalent bonds to hydrogen atoms in water molecules, whereas the RATTLE algorithm 68 maintained all other covalent bonds involving hydrogens.The particle-mesh Ewald 69 method was employed to compute long-range electrostatic interactions over a 1.2 Å grid.All van der Waals and short-range electrostatic interactions were evaluated every time step using a cutoff of 12 Å and a switching distance of 10 Å; full electrostatics were evaluated every second-time step.
The all-atom models of α-hemolysin suspended in a lipid bilayer membrane were built using CHARMM-GUI 70 .The initial structural model of α-hemolysin was taken from the Protein Data Bank (PDB ID: 7AHL) 71 .After adding missing atoms and aligning the primary principal axis of the protein with the z axis, the protein structure was merged with a 15 × 15 nm 2 patch of a pre-equilibrated 1-palmitoyl-2 -oleoyl-sn-glycero-3-phosphocholine lipid bilayer.The protein-lipid complex was then solvated in a rectangular volume of ∼78,500 pre-equilibrated TIP3P water molecules 72 .Gdm + , K + and Cl − ions were added at random positions corresponding to target ionic concentrations.Additional ions were introduced to neutralize the system.Each final system was 15 × 15 × 18 nm 3 in volume and contained approximately 300,000 atoms.Upon assembly, the systems were initially equilibrated using the default CHARMM-GUI's protocol.Specifically, the systems were subjected to energy minimization for 10,000 steps using the conjugate gradient method.Next, lipid tails and protein side chains were relaxed in a 2.5-ns pre-equilibration simulation that was run while restraining the protein backbones and lipid head groups.This step was followed by a 25-ns simulation in the NPT (constant number of particles, pressure and temperature) ensemble using the Nosé-Hoover Langevin piston pressure control 73 .In all simulations, the temperature was maintained at 298.15 K by coupling all nonhydrogen atoms to a Langevin thermostat with a damping constant of 1 ps −1 .
The atomic coordinates of the MBP were obtained from the Protein Data Bank (entry 1JW4) 74 .The missing hydrogen atoms were added using the psfgen plugin of VMD 75 .The protonation state of each titratable residue was determined using PROPKA 76 according to the experimental pH conditions (7.5 pH).Next, the protein was split into seven peptide fragments producing six 53-residue and one 52-residue fragments.The N-terminal of each peptide was terminated with a neutral acetyl group (ACE patch), whereas the C-terminal was terminated with an N-methyl group (CT3 patch).Each peptide was stretched using constant velocity SMD in vacuum, followed by 5 ns equilibration in a 1.5 M GdmCl solution.During the 150 ps SMD run, the C-terminal of the peptide was kept fixed.At the same time, the N-terminal was coupled to a dummy particle utilizing a harmonic potential (k spring = 7 kcal mol −1 Å −2 ), and the dummy particle was pulled with a constant velocity of 1 Å ps −1 .At the end of the equilibration step, each peptide fragment had a contour length of approximately 167 Å, ~3.16 Å per residue.Next, we used the phantom pore method 48 to convert the geometrical shape of the α-hemolysin nanopore to a mathematical surface.To fit the stretched peptide into the α-hemolysin pore, the phantom pore surface was initially made to represent a nanopore that was 1.4 times wider than the pore of α-hemolysin.During a 2 ns simulation, the phantom pore was gradually shrunk to match the shape of the α-hemolysin nanopore.At the same time, all atoms of the peptide and all ions laying outside the potential were pushed toward the nanopore center using a constant 50 pN force.At the end of the simulation, each peptide fragment and all guanidinium ions residing within 3 Å of any peptide atom were placed inside the pre-equilibrated α-hemolysin system having the peptide's backbone approximately aligned with the nanopore axis.The system was re-ionized to match the target ion concentration, and minimized for 4800 steps.Before the production runs, each system was equilibrated for 20 ns in the NPT ensemble at 1.0 bar, and 298.15K with all C α atoms of the α-hemolysin protein restrained to the crystallographic coordinates.
All production simulations were carried out in the constant number of particles, volume and temperature ensemble (NVT) under a constant external electric field applied normal to the membrane, producing a ± 200 mV transmembrane bias.To maintain the nanopore's structural integrity, all protein's C α atoms were restrained to exact coordinates as in the last frame of the equilibration trajectory using harmonic potentials with spring constants of 1 kcal mol −1 Å −2 .The ionic currents were calculated as described previously 50 .To quantify protein translocation, we defined the number of residues translocated as the number of nonhydrogen backbone atoms passing below the α-hemolysin constriction divided by the total number of nonhydrogen backbone atoms in one residue.The constriction's z coordinate was defined by the center of mass of the backbone atoms of residues 111, 113 and 147.The concentration profile and guanidinium binding analyses were carried out using in-house VMD scripts.All MD trajectories were visualized using VMD 75 .

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Fig. 1 |
Fig.1| Enzyme-free full-length protein translocation through nanopores.a, Schematic cut-away view of a PBD n -PEO m block-copolymer bilayer (inset shows polymer structure) suspended on the WOP aperture (orange depicts lipid solvent), with an α-hemolysin nanopore inserted into the bilayer.The guanidinium chloride (GdmCl) buffer unfolds the analyte proteins while leaving the α-hemolysin nanopore intact.A pair of Ag/AgCl electrodes generate transmembrane voltage and measure the ionic current through the nanopore.b, Current-voltage dependence of a single α-hemolysin channel at several buffer conditions (V is applied to the trans chamber).c, Graphical representations of the formal charges at pH 7.5 for the protein constructs used in this study in their unfolded (solvent accessible) state.A plot above each charge graph shows the relative amino acid volumes (pink).d, Current versus time trace (left) and the fractional blockade versus dwell time scatter plot (right) recorded from WT-MBP at V = 175 mV (WT-MBP) = 0.35 µM.Open pore current (I o ) is indicated above each trace.e, Same as in d but for MBP containing a C-terminus aspartate tail (MBP-D10; MBP-D10) = 0.35 µM.The scatter plot indicates the fraction of detectable events near the dashed circle.

Fig. 2 |
Fig. 2 | Transport properties of unfolded protein analytes.a, Fractional current blockade versus dwell time scatter plots for MBP-D10 in 1.0 M, 1.5 M and 2.0 M GdmCl buffer (+1 M KCl).Red and blue ovals show populations that correspond to P F (partially folded) and P L (linear or unfolded) states of MBP-D10, respectively.MBP-D10 = 0.7 µM for 1.0 M GdmCl and 0.35 µM for 1.5, 2.0 M GdmCl experiments.b, Mean dwell time versus voltage with exponential fitting for the P L populations of MBP-D10 and diMBP-D10, respectively (error bars represent the FWHM of the distribution fits).c, Protein transport velocities calculated from estimated protein contour lengths and observed dwell times as a function of applied voltage (error bars are based on the dwell time distribution widths shown in b).Buffer conditions for data shown in b and c are 10 mM Tris, pH 7.5, 1 M KCl, and either 1.5 M or 2.0 M GdmCl.For the latter, data at only one voltage (175 mV) are shown.d, Dwell time histograms for GFP-D10, MBP-D10, D10-MBP and diMBP-D10 along with the mean diffusion coefficients (nm 2 µs −1 )and velocities (nm µs −1 ) determined from fits to the 1D Fokker-Planck equation53,62,63 .e, Fractional current blockade versus dwell time scatter plots and dwell time histograms for C-terminus (MBP-D10) versus N-terminus (D10-MBP) threading and transport of full-length MBP.Experiments in d and e were performed in 1 M KCl, 2.0 M GdmCl, 10 mM Tris, pH 7.5, under a 175 mV bias applied to the trans chamber. Articlehttps://doi.org/10.1038/s41587-022-01598-3

Fig. 3 |
Fig.3| MD simulation of ion, water and peptide transport through α-hemolysin.a, All-atom model of α-hemolysin (gray) containing a fragment of the MBP protein (orange), embedded in a lipid membrane (blue) and submerged in the 1.5 M GdmCl / 1.0 M KCl electrolyte mixture.b, Total charge carried by ion species in seven independent MD simulations differing by the sequence and initial conformation of the MBP fragment.Hereafter each trace is shown using two alternating colors to indicate data from independent trajectories.The traces are added consecutively to appear as a continuous permeation trace.The slope indicates the average current.c, Average ionic current for the three electrolyte conditions.Hereafter, the average and the standard error are calculated considering each trajectory-averaged value as a result of an Articlehttps://doi.org/10.1038/s41587-022-01598-3

Fig. 4 |
Fig. 4 | Unidirectional translocation and discrimination of MBP variants.a, Soft-DTW barycenters (solid curves) showing resampled events of MBP-D10 (left), D10-MBP (middle) and diMBP-D10 (right), with their respective events superimposed in the background (black).The positions of identified maxima (pink arrows) and minima (blue arrows) are shown.b, Three-way confusion matrix representing the mean classification accuracy of a multiclass GBC trained to discriminate MBP-D10, D10-MBP and diMBP-D10 (mean ± s.d. of nine GBC models, each trained and tested on 80% of the data from a reshuffled dataframe, shown in Supplementary Fig. 27).c, A heatmap showing the relative importance of each feature used to generate the multiclass GBC model for discrimination of MBP variants (all 70 feature importance values sum to 100, shown percentages are rounded to the nearest integers).Each column of the heatmap represents a segment index of the event, and each row represents a statistical parameter extracted from every segment (refer to Supplementary Fig. 27 for further details on features).All experiments were performed in 1.0 M KCl, 2.0 M GdmCl, 10 mM Tris, pH 7.5 and under a 175 mV bias applied to the trans chamber. Articlehttps://doi.org/10.1038/s41587-022-01598-3

Fig. 5 |
Fig. 5 | Single-molecule fingerprinting of full-length MBP-D10 and GFP-D10.a-c, Posttraining GBC classification results for unlabeled MBP-D10 (red) and GFP-D10 (green) mixture experiments at 20:80 (a), 50:50 (b) and 80:20 (c) ratios with a probability classification estimate associated with each translocation event.d, Percentage of MBP-D10 (red) to GFP-D10 (green) predicted by the GBC model (shown on the y axis) when applied to different MBP-D10:GFP-D10 ratio experiments.Each marker is the result of nine GBC models, each fit with 80% of the training data from a reshuffled dataframe containing features from pure MBP-D10 and GFP-D10 experiments (mean ± s.d.).Experiments conducted on different days at 50:50 ratio are shown as hollow markers, highlighting experimental variability.Refer to Supplementary Table6for sample size of events parsed for each ratio experiment.e, Two-way confusion matrix representing the mean classification accuracy of a GBC model trained