Development of high anity monobodies recognizing SARS-CoV-2 antigen

The coronavirus disease 2019 (COVID-19) pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been a threat to global public health. Prompt patient identication and quarantine is the most effective way to control its rapid transmission, which can be facilitated by early detection of viral antigens. Here we present a platform to develop and optimize the bronectin-based anity-enhanced antibody mimetics (monobodies) for recognizing viral antigens. Specically, we developed monobodies targeting SARS-CoV-2 nucleocapsid (N) protein. We showed that two monobodies, NN2 and NC2, bind to N protein’s N- and C-terminal domains respectively with a Kd in nM range. The specicity of the recognition was conrmed with co-immunoprecipitation and immunouorescence assays. Furthermore, we demonstrated that one round of in vitro maturation using mRNA display can improve the binding anity of monobodies. Machine learning algorithms were integrated with deep sequencing data for selecting candidates that improve the detection sensitivity of N. Using this pair of monobodies, we have developed an enzyme-linked immunosorbent assay (ELISA) for viral detection. We were able to detect recombinant N at 4 pg/ml and detect N in viral culture supernatant, with no cross-reactivity with other CoV. Integrating high-dense mutagenesis, mRNA display, deep sequencing and machine learning, this platform can be applied through iterations to identify and optimize monobodies against emerging viral antigens, potentiating point-of-care detection of communicable diseases in a cost-and time-sensitive manner. Authors work. tag), and in BL21 competent cells. BL21 cells the plasmid were inoculated into 5 ml LB containing 50 ug/ml ampicillin or kanamycin and cultured overnight at 37oC and 200 rpm. The cultured bacteria were transferred to fresh LB media with Ampicillin or Kanamycin at 1:100 ratio. OD600 was monitored until it reaches 0.4. IPTG was added to the culture to a nal concentration of 200 uM, and then cultured overnight at 18oC (~20 hours). Cells were pelleted by centrifugation at 8000 g for 20 minutes, and lysed by B-PER buffer with protease inhibitors. Anity purication was performed using glutathione Sepharose 4B resin for MBP-tagged monobody, or Ni-Column for HIS- tagged monobody. Protein fraction was dialyzed to ion-exchange A buffer (20 mM Tris-HCl pH8.0; 75 mM NaCl; 6 mM b-ME) overnight under 4oC. After dialysis, the protein sample was loaded on Q-HP column (GE) and eluted in a linear gradient way to B buffer (20 mM Tris-HCl pH8.0; 1000 mM NaCl; 6 mM b-ME). Target protein fractions were collected and concentrated. Gel-ltration was performed using HiLoad Superdex 75 prep grade (GE) equilibrated by a buffer (20 mM Tris-HCl pH8.0; 150 mM NaCl; 2 mM DTT). Aggregation was removed. Target protein was eluted and pooled. Protein sample was concentrated and stored at -80oC. Protein concentration was measured by Commassie Plus Reagent.


Introduction
Since December 2019, there has been an ongoing global outbreak of coronavirus disease 2019 (COVID-19), a highly contagious viral pneumonia causing fever and shortness of breath (Chan et al., 2020;Heymann and Shindo, 2020;Huang et al., 2020;Wang et al., 2020a;WHO, 2020). The disease has quickly spread to more than 100 countries and regions within 3 months, developing into a pandemic as announced by the World Health Organization (WHO) in early March 2020. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of COVID-19, has an estimated reproduction number (R0) of around 2-2.5 without effective control, higher than seasonal in uenza virus (Kucharski et al., 2020;Liu et al., 2020;Zhao et al., 2020). To break the chain of infection, prompt diagnosis and quarantine of patients, especially at the early phase of infection, is crucial for reducing chances of community transmission (Wang et al., 2020b;Wilder-Smith and Freedman, 2020). Nevertheless, current diagnostic techniques have various limitations. Kim et al., 2020;Thevarajan et al., 2020;Wang et al., 2020c;Wölfel et al., 2020;Xie et al., 2020), while serological tests are ineffective in detecting active and recent infections before antibody responses occur (Amanat et al., 2020;Pan et al., 2020;Xia et al., 2020;Xiang et al., 2020). There is hence an urgent need to improve the development of diagnostic techniques to achieve high sensitivity and speed for effective disease control of COVID-19.
A sensitive viral antigen detection reagent is one way of advancing diagnostics for emerging diseases. An ideal reagent for antigen recognition is expected to ful ll several criteria, including (1) e cacy (high sensitivity and speci city), (2) scalability (production at low cost and high speed), (3) customizability (selection of desired properties at a de ned condition in vitro) and (4) small size (compatibility with sensitive nano-platforms and in vivo imaging). To date, antibodies are the most widely used reagents in conventional assays for viral antigen detection, including ELISA, chemiluminescent enzyme immunoassay (CLIEA) and lateral ow immunoassay (LFIA) (Chevaliez and Pawlotsky, 2008;Young et al., 2000). Despite being effective, antibody-based detections have a couple of limitations. Firstly, it takes at least a few months to develop speci c antibodies, which often involves immunizing animal hosts and selection of monoclones. Secondly, generation of antibodies is relatively high-cost and subject to batchto-batch differences (Sajid et al., 2015). Thirdly, the application of antibodies is usually limited to close to physiological conditions due to the lack of resistance to high temperature or pH extremes. Lastly, the large size of common antibodies restricts their applications in nanobiosensors, such as eld-effect transistor (FET) nanosensors, where signal readings are limited by the Debye screening length (Vu and Chen, 2019). Establishment of a sensitive and small recognition reagent that overcomes the above limitations of antibodies will be useful in facilitating population-wide screening of COVID-19.
Previously, using mRNA display method, we developed a pair of bronectin-based antibody mimetics (monobodies), NN1 and NC1 for detecting nucleocapsid (N) protein of SARS-CoV (SARS) with a K d in sub-nM range . Evolved from the 10th bronectin type III domain (10Fn3), these monobodies can be complementary to conventional antibodies and overcome some of the above limitations. They comprise of a beta sheet sandwich structure that is energy-favorable, and can be easily scaled up in E. coli for time-sensitive mass production at a low cost (Olson and Roberts, 2007). With a small size of 3 nm in diameter, they are also suitable to be used in conjunction with nanotechnology.
We demonstrated that the monobodies can be conjugated onto single welled carbon nanotube, and facilitate the large-scale parallel viral detection at a high speed (Ishikawa et al., 2009).
Thus, these monobodies may complement antibodies as detection reagents by having (1) high binding a nity and speci city, (2) low cost, and (3) small protein size.
Here we presented a platform that can rapidly and iteratively evolve the binding a nity of monobody against SARS-CoV-2 N. By mutagenizing the parental sequences, we evolved a pair of a nity-enhanced monobodies that could e ciently recognize SARS-CoV-2 N in vitro and in vivo. The two monobodies, namely N-NTD-2 (NN2) and N-CTD-2 (NC2), bind to N-terminal domain (NTD) and C-terminal domain (CTD) of SARS-CoV-2 N respectively with a high a nity demonstrated with SPR and ITC. NN2 and NC2 can form a protein complex with N, and can speci cally recognize endogenous N in viral infected cells. We also demonstrated the feasibility of using one-round in vitro maturation and machine learning algorithms to further improve the binding a nity. Additionally, we demonstrated the potential diagnostic utility of monobodies with ELISA-based detection for N of SARS-CoV-2 with sensitivity at 4 pg/ml. The ELISA can also be used to detect N in viral supernatant with no cross-reactivity against other CoV. This platform can be generally applied to develop detection reagents for emerging viral antigens,in a time and cost-effective manner.

Results
Development of bronectin-based monobodies against SARS-CoV-2-N By randomizing BC and FG loops, the 10Fn3 has been developed as antibody mimetic recognize different targets (Koide et al., 1998). It is topologically analogous to the immunoglobulin VH domain, and expresses well in both eukaryotic and prokaryotic cells (Olson and Roberts, 2007). Previously using mRNA display of a randomized 10Fn3 library, we generated two monobodies for SARS N with a Kd in nM range . The monobody binding to the SARS N at N-terminal domain (NTD) with a Kd=72 nM is called NN1, and the monobody binding to the N protein C-terminal domain (CTD) with a Kd=1.7 nM is called NC1.
To evaluate potential binding of NN1 and NC1 to SARS-CoV-2 N and examine the cross-reactivity with other CoV, we rst performed sequence comparison of SARS-CoV-2 , SARS, MERS-CoV (MERS), and other CoV causing mild upper respiratory disease (OC43, 229E and NL63) (Stothard, 2000). SARS-CoV-2 N is highly similar to SARS N, with 91% identity and 94% similarity ( Figure. 1A). The NTD and CTD of SARS-CoV-2 N are almost identical to that of SARS N (96.9% and 98.3% similarity, respectively), suggesting that our monobodies (NN1 and NC1) may also recognize SARS-CoV-2. Unlike SARS, other CoV are dissimilar from SARS-CoV-2 N, which has 50% identity with MERS and 28-35% identity with the others (Fig.1A). Phylogenetic tree also suggested that SARS-CoV-2 is closely related to SARS, but distant from other CoV (Supplementary Figure. 1A) (Stecher et al., 2020). Moreover, the sequence of N, particularly NTD and CTD that potentially recognized by our monobodies, is highly conserved among reported SARS-CoV-2 sequences. This indicates that NTD and CTD of N are robust targets for SARS-CoV-2 detection (Supplementary Figure. 1B).
We performed in vitro maturation using a doped library to further improve the a nity and stability of monobodies ( Figure. 1B). Using NN1 and NC1 as backbone, we generated mutations in all 17 amino acids across BC and FG loops. The diversity was generated to be biased towards the parental monobodies using doped nucleotide mixtures (Supplementary Figure. 1C&D). The resulting library contained mutations towards all 19 other amino acids on each position, with 23-37% remaining to be the parental amino acid (Supplementary Figure. 1E). As SARS-CoV-2 N sequence was not available at the time of performing the experiment, SARS N was used as the bait to select the mutant libraries through mRNA display. After 6 rounds of selection with an increased binding temperature (Method, Extended Data Figure. 2), we identi ed and con rmed two monobodies with increased binding capacity to SARS N (NN2 and NC2, Figure. 1C). The enhancement in a nity of NN2 and NC2 was quanti ed by surface plasmon resonance (SPR). NN2 and NC2 were expressed in bacteria and puri ed by a nity chromatography, while SARS N, expressed and puri ed from 293T cells, were immobilized on Biacore chips as ligands. NN2 achieves 22-fold increase in binding a nity, resulting in a Kd=3.3 nM. NC2 has a binding constant of 390 pM with SARS N, corresponding to about 4-fold a nity improvement compared with parental NC1 (Supplementary Figure. 2A&B).
After the sequence of SARS-CoV-2 is available, we tested whether these binders have a nity to SARS-CoV-2 N, prompted by the above sequence analysis ( Figure. 1D). By quantifying their binding a nity to SARS-CoV-2 N with SPR, we observed that NN2 has an on-rate of 3.1 x 105 M-1 s-1 and an off-rate of 8.5 x 10-3 s-1, resulting in a Kd=27.6 nM. NC2 has on-rate of 5.7 x 105 M-1 s-1 an off-rate of 3.9 x 10-3 s-1, resulting in a Kd=6.7 nM. These data suggested that these monobodies bind with SARS-CoV-2 N in vitro, with high a nity.

Monobodies recognize endogenous SARS-CoV-2-N in vivo
Following veri cation of the high-a nity binding between the two monobodies with SARS-CoV-2 N in vitro, we examined if they can speci cally bind to endogenous N in cells. Speci cally, we examined if both remain binding to their respective domains of SARS-CoV-2 N. Co-immunoprecipitation (co-IP) experiment was performed in 293T cells using truncated fragments of SARS-CoV-2 N ( Figure. 2A). Consistent with parental NN1 and NC1, NN2 showed binding to full-length N, but lost interaction with CTD only or when NTD was deleted (N-deltaN), suggesting its speci c binding to NTD of SARS-CoV-2 N. On the other hand, NC2 remained binding with CTD, suggesting that it is a CTD binder. Using sequential IP, we further demonstrated that NN2, NC2 and SARS-CoV-2 N formed a protein complex in cells, indicating that the bindings of these two monobodies to SARS-CoV-2 N are not competitive (Figure. 2B). Independent binding of two monobodies was also supported by SPR analysis. When combining both monobodies at a concentration exceeding saturation, the response is additive with totaling 94% of the separate binding responses, demonstrating that neither of the two molecules obscure one another (Figure. 2C). The above results, consistent with the fact that the CTD and NTD are separated by a exible region on N protein, support the feasibility of using this pair of monobodies simultaneously to establish sandwich ELISA for antigen detection.
After investigating the physical interaction between the two binders and SARS-CoV-2 N, we validated their binding in situ by immuno uorescence assay (IFA). Fibronectin-based monobodies are shown to localize in nucleus upon overexpression in mammalian cells ). However, with the expression of SARS-CoV-2 N in A549 cells, the localization of monobodies became cytosolic (Supplementary Figure. Figure. 2D). Notably, NC2 exhibited higher intensity than NN2 in SARS-CoV-2 infected cells, probably due to the higher binding a nity ( Figure 2D Further a nity maturation of SARS-CoV-2 N-targeting monobodies using single round selection of a highcomplexity variant library We next sought to further improve the a nity of monobodies speci cally towards SARS-CoV-2 N by in vitro maturation. NN2 and NC2 were used as parentals and large variant libraries for each of them were generated. As we previously showed that functional interactions usually occur in nearby residues (Olson et al., 2014), we chose to cover all possible variations at two adjacent residues in BC and FG loops. The two exible loops were diversi ed independently and combined orthogonally, resulting in about 10.5 million amino acid variants of each monobody ( Figure. 3A). This diversi cation strategy gives us the exibility to discover residue interactions at nearby positions and between the two loops, while controlling the library size to enable single round of selection. The libraries also had ~0.1% spike-in of parental NN2 or NC2 as the internal positive control. Following library construction, strep tagged-SARS-CoV-2 N were expressed in 293T cells, puri ed and immobilized onto beads as the bait. Single round of mRNA display of variant libraries was performed with N at two amounts, low (~1 ug) and high (~10 ug), representing different selection pressures. We also included NP of IAV as a bait as the negative control. Lastly, deep sequencing was used to identify variants and quantify the relative frequency of each variant in the libraries.
Sequencing data revealed even representation of variations for all positions in both loops in the input library (Supplementary Figure. 3A&B). 1.74 million NN2 variants and 1.82 million NC2 variants with read depth more than 5 in the input library were considered as candidates of high con dence, and included in downstream analysis. The input read depth of variants correlated well with the ones post SARS-CoV-2 N selection, but not with those post IAV NP selection (Supplementary Figure. 3C). Quantitatively, 24.9-32.4% NN2 variants and 25.1-33.5% of NC2 variants were identi ed after selection with SARS-CoV-2 N, while only 8.4% of NN2 variants and 1.3% NC2 variants were identi ed with IAV NP. This indicated e cient and speci c selection with SARS-CoV-2 N for both monobody libraries, but not with IAV NP. To quantify the relative binding a nity for each variant, we calculated the relative enrichment (RE) score as the ratio of its frequency in the selected library to that in the input library, normalizing to the relative frequency of its corresponding parental sequence. As expected, histogram of RE scores showed clusters of enriched variants with SARS-CoV-2 N as bait, but not with IAV NP (Figure. 3B). The RE scores correlated well between the two selection conditions of N ( Figure. 3C). NN2 library contained more highly enriched variants (RE score > 10 for both conditions) comparing with NC2, probably due to the larger space for improvement, when starting with NN2 with lower a nity ( Figure. 3C). To provide a global visualization of amino acid preference in the selected libraries, we calculated the averaged RE scores of all variants with certain substitutions on each amino acid position ( Figure. 3D). The averaged RE scores were normalized across all randomized positions, to reveal the preference of substitutions on different residues. In general, selection with low and high amount of SARS-CoV-2 N provided similar heatmaps. NN2 and NC2 showed different preference on variated residues. For NN2, the highly enriched variations clustered mostly at the N terminal of BC loop and relatively dispersed on FG loop. For NC2, the N terminus of BC loop and the C terminus of FG loop showed more enrichment upon substitutions. Our results suggested e cient a nity enhancement of monobodies by one-round of in vitro maturation.
Incorporating machine learning for the validation of 3rd generation monobodies The high throughput measurement enables us to survey a large number of variants.
However, there are uncertainties due to many factors, including sampling bias and sequencing errors. Arti cial Neural Network (ANN) models are generic classi er machines that can extract underlying patterns in different types of data, minimizing the uncertainties. They have been widely adapted to facilitate sequence-to-structure or sequence-to-function prediction (Fernandez-de-Cossio-Diaz et al., 2020;Girija, 2016;Liu et al., 2018;Senior et al., 2020). To identify the sequence pattern more likely responsible for the increased binding a nity and to rule out stochastic errors for candidate selection, we trained an ANN model ( Figure. 4A) to predict RE scores of variants based on their amino acid sequences. 90% of the experimental data was fed to the model and the remaining 10% was used for validation. The 17-residue mutagenized region was parsed into a 357-dimension boolean array, and the RE scores were parsed into 10 quantiles as output. The predicted RE score was calculated as the weighted average of the output array. The predictions faithfully captured the patterns of sequences with increased a nity (Figure. 4B). It correlated signi cantly with experimental enrichment with SARS-CoV-2 N as bait for the 10% validation data, as well as the whole dataset, but not with IAV NP (Supplementary Figure. 3D&3E).
As NN2 had lower a nity to N (Figure. 1D), and had more highly-enriched candidates during the in vitro maturation ( Figure. 3B&3C), we expect that it has a larger room for improving binding a nity. We hence focused on NN2 for validating potential 3rd generation NTD binders. First, by considering only the experimental RE scores, we identi ed 1170 candidates with RE scores >10 in both N selection conditions, and RE score <1 under IAV NP selection. Second, by integrating with ANN prediction results (top 10%), we narrow the lists down to 134 candidates (Supplementary Table 1). We picked 4 clones as the representatives of the 3rd generation NTD binders (NN3-1, NN3-2, NN3-3 and NN3-4), with a bias towards hydrophilic residues on the loops (Supplementary Figure. 3F). All 4 monobodies were expressed and puri ed. NN3-1 showed the highest expression level in E. coli and improved solubility comparing with NN2. SPR analysis suggested that NN3-1 showed improvement in both on-rate (4.1 x 105 M-1 s-1) and off-rate (7.4 x 10-3 s-1) when binding with SARS-CoV-2 N. This results in a Kd=18.1 nM, corresponding to 1.5-fold increase with parental NN2 (Figure. 4E). To more accurately measure the binding a nity between NN3 with N-NTD in solution, we puri ed NTD from E. coli and performed isothermal titration calorimetry (ITC). Similar to the results from SPR, NN2 showed to have a Kd of 17.36 nM, while NN3 improved to be 14.43 nM ( Figure. 4F). Consistently, when comparing the staining intensity of monobodies with anti-SARS-N antibody in viral infected cells, we noticed that NN2 showed weaker signal comparing with antibody in SARS-CoV-2-infected cells, but similar intensity in SARS-infected cells. However, staining with NN3-1 showed much stronger signal in SARS-CoV-2-infected cell and decreased signal in SARS-infected cells, supporting that NN3-1 improved the a nity and selectivity against SARS-CoV-2.

Development of ELISA for N detection using monobodies
To demonstrate the potential usage of monobodies in viral detection, we developed sandwich ELISA for SARS-CoV-2 N detection. To assess feasibility of the assay, we rst utilized monobodies as a capture protein and a commercial anti-SARS N antibody for detection (i.e. single monobody-based assay). As capture proteins usually require a higher amount than detection proteins, the ease and e ciency of monobody production in E. coli makes them an appealing replacement for traditional antibody-based capture in ELISA. The standard curve was established with puri ed recombinant N diluted in BSA. It was found that using colorimetric TMB substrate, NN2-coated plate could detect N at a limit of 1 ng/ml, while NC2-coated plate had a sensitivity of 360 pg/ml (Supplementary Figure. 4A&4B). The detection sensitivity could be improved with a chemiluminescence substrate, reaching 320 pg/ml for NN2 and 64 pg/ml for NC2 ( Figure.  To examine the detection sensitivity and speci city of the above single monobody-based ELISA for N in cell culture, we obtained viral culture supernatants of SARS-CoV-2, MERS, and IAV (strain: A/Cal/04/09). The viral supernatants were inactivated in 0.5% Triton X-100, and serially diluted in culture media plus 0.5% Triton X-100. From the standard curve constructed with recombinant N, we estimated that the concentration of N was about 26.4 ng/ml in the supernatant (Figure. The above results illustrated that our monobodies could be paired with antibody to detect circulating N. We then demonstrated that the ELISA could work by pairing the two monobodies NC2 and NN2 (i.e. dual monobody-based assay). Firstly, we coated HIS-tagged NC2 onto a plate and detected serially diluted recombinant N using FLAG-tagged NN2 (Figure. 5F). The anti-FLAG antibody was used for detection. Using chemiluminescence substrate, we could detect N at a concentration of 10 pg/ml. Secondly, to simplify the experimental procedure, we constructed a NN2 protein with three cysteines at the C terminus, enabling direct HRP conjugation through activated sulfhydryl groups. Using NN2-HRP as the detection agent, we could detect recombinant N at 1.5 ng/ml with TMB substrate, and around 10 pg/ml with chemiluminescence substrate (Figure. 5G, Supplementary Figure. 4E).
As previously we demonstrated the 3rd generation monobody NN3-1 has a higher a nity to N comparing with NN2, we examined if the usage of NN3-1 can improve the detection sensitivity of N. As expected, the single monobody-based assay using NN3-1 as the capture protein improved the sensitivity of ELISA from 1 ng/ml to 316 pg/ml, using TMB substrate.
Together, we presented a platform for identifying and evolving monobodies against SARS-CoV-2 N ( Figure 6). We rst created millions of variants on the backbone of parental monobody through highdense mutagenesis. Followed by mRNA display and deep sequencing, we were able to systematically pro le the relative a nity of these variants against viral antigen. Machine learning can be integrated to facilitate the validation of positive hits. The binding a nity and speci city can be determined by various biological and biochemical assays. As demonstrated by the high sensitivity of ELISA developed from the monobody pair, the combined advantages of high binding a nity and speci city, low cost, and small protein size in monobodies support their use as antigen detection reagents for developing diagnostic methods for emerging infectious diseases, including but not limited to COVID-19.

Discussion
COVID-19 is a rapidly evolving global pandemic. An effective and scalable detection method for early identi cation and isolation of infected individuals is hence critical for controlling the spread of COVID-19.
In view of the limitations of current diagnostic techniques with high cost, low sensitivity, and long turnaround assay time, we presented a platform for developing high-a nity monobodies recognizing viral antigen. Particularly, we developed a nity-enhanced monobodies that speci cally recognize N both in vitro and in vivo. The two monobodies have a high a nity to SARS-CoV-2 N with Kd at nM range, and can be mass-produced from E. coli. They can recognize and, together, form a protein complex with endogenous N. The potential utility of this pair of monobodies was shown with an ELISA that has relatively high sensitivity and speci city.
Comparing with antibodies, our monobody-based platform has the following advantages. First, we can rapidly and iteratively evolve monobodies to target selected viral antigen. Coupling mRNA display with high-throughput sequencing, the screening and validation process takes about two weeks, faster than traditional antibody production involving animal immunization, isolation and puri cation of antibodies. It is therefore suitable for emerging situations like COVID-19. Second, it is relatively low-cost through massproduction in E. coli, demonstrating great scalability. Third, in addition to sensitivity and speci city, monobodies with desired properties can be generated through different selection pressure. For example, performing the selection at an elevated temperature or with protease challenge generate monobodies with higher thermal stability and protease resistance. Such selection is hardly customizable in antibodies where production is restricted at physiological conditions. Last, monobodies are signi cantly smaller and less complex than multi-domain antibodies, possibly achieving higher density in nanodevices to improve sensitivity, and higher permeability to cells for imaging. Thus, complementary to antibodies, monobodies can be potentially applied to developing diagnostic assays and enable population-wide screening in a cost-and-time effective manner.
As a proof of concept, we presented the application of monobodies through the establishment of ELISA. Although RT-qPCR is the current standard test for diagnosing COVID-19, it has speci c equipment requirement and laborious procedures that are prone to contamination without dedicated facilities. For the diagnosis of SARS, we and others have developed ELISA or chemiluminescent enzyme immunoassay (CLIEA) using antibodies to detect circulating viral N with sensitivity of 80-100% (Che et al., 2004b(Che et al., , 2004aFujimoto et al., 2008;Li et al., 2005). In SARS-infected patients, a high concentration of viral N has been detected in multiple body uids as early as 1 day after the onset of symptoms, making it an ideal target for early viral detection The concentration of N in patient sera is estimated to be around 100 pg/ml to 3.2 ng/ml (Che et al., 2004a). Given the high similarity of viral genome and clinical course between SARS and SARS-CoV-2, we postulate that the concentration range of SARS-CoV-2 N in patient sera will be similar to that of SARS. For our established ELISA using monobodies, we achieve a detection sensitivity of around 10 pg/ml, which is the estimated lower boundary of N concentration in SARS-CoV-2 patients. To make it clinically applicable, we are optimizing the conditions to, on one hand, further improve the detection limit by reducing background noises, and on the other hand, enable long-term storage of monobody-coated ELISA plates. Different formulations of dispensing solution will be tested for their ability to immobilize the monobodies to surfaces with long-term stability. Parameters include but are not limited to the following: varying the buffer type, pH, the addition of stabilizing sugars, glycerol, ethanol, surfactants and polymer additives.
Besides ELISA, our monobody-based system may be compatible with other point-of-care detection platforms, including LFIA strips and nanobiosensors. By conjugating the monobodies with gold nanoparticles, LFIA strips can detect the existence of N in biological samples through simple visualization. Miniature devices like nanobiosensors have also attracted great attention due to their short detection time, high sensitivity and accuracy for real time analysis with small amounts of samples (Lee et al., 2008;Noah and Ndangili, 2019;Singh et al., 2016;Vu and Chen, 2019). The number of immobilized binders is a critical factor determining the performance of sensors. Ascribing to their small size, higher density and larger surface-to-volume ratio of monobodies may enable the detection of antigens at low concentrations. In fact, we previously demonstrated that our monobody can be coated onto nanobiosensor and detect circulating N at nM concentration, suggesting the potential of using monobody as detection reagents on nanosensors (Ishikawa et al., 2009). Furthermore, our small monobodies can possibly act as in vivo imaging probes for basic and clinical research, compatible with photon emission computed tomography (SPECT) or positron emission tomography (PET). Molecular imaging has been developed for monitoring in ammation of lung diseases and viral dynamics in primates (Gordon et al., 2019;Lepin et al., 2010;Santangelo et al., 2015), which often uses antibody as detection probe. Despite their high sensitivity, antibodies are large in size and thus with low tissue penetration and low clearance time. As an alternative, the monobodies may be able to increase tissue permeability and clearance time, thereby reducing background noises and allowing early imaging. Moreover, the small monobody can be e ciently packed into nanocapsule and potentially delivered to target cells, eventually allowing intracellular tracing of endogenous antigens (Abellan-Pose et al., 2016;Wen et al., 2019).
In summary, we have presented a platform for developing and evolving monobodies as a viral antigen recognition element, including high-dense mutagenesis, mRNA display, deep sequencing and machine learning. It can be potentially applied to developing various diagnostic techniques for screening viral detection reagents against emerging communicable diseases, including but not limited to COVID-19.

Declarations
Acknowledgement: The work on COVID-19 and SARS-CoV-2 is supported by UCLA Innovation Fund and UCLA AIDS Institute

Con ict of Interests
Authors declare no con ict of interests.

Sequence analysis
Sliding window similarity was calculated using Sequence Manipulation Suite 2, with a window size of 40 residues. Residue 45-175 were highlighted as NTD and residue 248-365 were highlighted as CTD.
The average distance tree was constructed using MEGA. The distance matrix for constructing the tree was BLOSUM62. The sequences used in the tree were reference genome sequences from NCBI nucleotide Entropy was calculated using GISAID SARS-CoV-2 data. All viral sequences were rst aligned by MAFFT, followed by annotation and extraction of the sequenced coding N genes for comparison. The entropy on each site was calculated using scipy.

Expression and puri cation of monobodies
The DNA encoding monobodies were cloned into pET11 plasmid (HIS tag) or pAO9 plasmid (MBP tag), and transformed in BL21 competent cells. BL21 cells containing the plasmid were inoculated into 5 ml LB containing 50 ug/ml ampicillin or kanamycin and cultured overnight at 37oC and 200 rpm. The cultured bacteria were transferred to fresh LB media with Ampicillin or Kanamycin at 1:100 ratio. OD600 was monitored until it reaches 0.4. IPTG was added to the culture to a nal concentration of 200 uM, and then cultured overnight at 18oC (~20 hours). Cells were pelleted by centrifugation at 8000 g for 20 minutes, and lysed by B-PER buffer with protease inhibitors. A nity puri cation was performed using glutathione Sepharose 4B resin for MBP-tagged monobody, or Ni-Column for HIS-tagged monobody. Protein fraction was dialyzed to ion-exchange A buffer (20 mM Tris-HCl pH8.0; 75 mM NaCl; 6 mM b-ME) overnight under 4oC. After dialysis, the protein sample was loaded on Q-HP column (GE) and eluted in a linear gradient way to B buffer (20 mM Tris-HCl pH8.0; 1000 mM NaCl; 6 mM b-ME). Target protein fractions were collected and concentrated. Gel-ltration was performed using HiLoad Superdex 75 prep grade (GE) equilibrated by a buffer (20 mM Tris-HCl pH8.0; 150 mM NaCl; 2 mM DTT). Aggregation was removed. Target protein was eluted and pooled. Protein sample was concentrated and stored at -80oC. Protein concentration was measured by Commassie Plus Reagent.
Expression and puri cation of viral N Open reading frames of viral proteins were cloned into pcDNA3 mammalian expression plasmid with a strep-tag at the C terminal. HEK293T cells were transfected with the plasmid by lipofectamine 3000. Cells were collected 24 hours post transfection and lysed with lysis buffer (20 mM Tris pH8.0, 500 mM NaCl, 1 mM EDTA, 1 mM DTT, 0.5% Triton X-100, supplemented with protease inhibitor cocktail, 1 mM NaF, 2 mM Na2V3O4 and 2 mM beta-Glycerophosphate) along with sonication. Cell lysates were incubated with MagStrep "type3" XT beads overnight at 4 C with constant agitation, and washed with lysis buffer for 5 times. Proteins were eluted with Elution Buffer BXT (IBA) for 30 minutes at 37oC and dialyzed against HBS-E buffer (10 mM HEPES pH7.5, 150 mM NaCl, 1 mM EDTA).

Surface plasmon resonance (SPR)
Binding kinetics between monobodies and N was measured with a Biacore T100 instrument. Various concentrations of monobodies were owed over a blank CM5 chip and an N-bound chip at 100 ul per minute for 120 seconds. Dissociation was monitored for 10 minutes before regeneration by washing with 5 mM sodium hydroxide at 30 ul per minute for 10 seconds. Kinetics data was obtained by tting with the Biacore evaluation software. For the double binding experiments, puri ed monobodies were owed at 100 ul per minute at increasing concentrations. The two monobodies were owed for 120 seconds separately at 300 nM and then together at 300 nM each, with regeneration in between injections.

Isothermal titration calorimetry (ITC)
The binding a nity between a body and N_Nter protein was measured under 25 degrees through isothermal titration calorimeter (MicroCal iTC200, Malvern Panalytical). The buffers of a body and N_Nter protein were the same (20mM Tris-HCl PH8.0, 150mM NaCl, 2mM DTT). The protein concentration in syringe ranged from 0.3 to 0.6mM while in reaction cell ranged from 0.015 to 0.06mM. After excluding the rst injection, all titration data was calculated and analyzed by MicroCal ITC-ORIGIN Analysis Software (Malvern Panalytical).

Enzyme-linked immunosorbent assay (ELISA)
Monobody protein stock was diluted in 1X coating buffer (Bio-Rad) to a working concentration of 6.4 ug/ml. 50 ul of working solution was added into each well and the plate was incubated overnight at 4oC to coat the binder to the plate. After coating, the binder working solution was discarded and the plate was washed with washing buffer (PBS, 0.05% Tween-20) twice. For blocking, 100 ul of blocking buffer was added to each well and the plate was incubated for 2 hours at 37oC. Then the blocking buffer was removed, and the plate was washed with washing buffer for 3 times. For N spike-in experiments, sample solution was diluted in 1% BSA/washing buffer containing 0.5% Triton X-100 or human serum specimens; for viral culture, sample solution was diluted in viral medium (1% FBS in DMEM medium) containing 0.5% Triton X-100. Next, 100 ul of sample solution was added into each well and incubated for 1 hour at 37oC. Afterwards, the plate was washed with washing buffer 5 times. Primary antibody (anti-SARS N rabbit polyclonal antibody, Rockland) was diluted 1:500 in washing buffer. 50 ul of primary antibody solution was added into each well and incubated for 1 hour at 37oC. Then the plate was washed again with washing buffer for 5 times. Secondary antibody (anti-rabbit IgG-HRP, Cell Signaling Technology) was diluted 1:5000 in washing buffer. 50 ul of secondary antibody solution was added into each well and incubated for 30 minutes at 37oC. After 5 times of washing with washing buffer, 50 ul of chemiluminescence substrate solution (SuperSignal™ ELISA Pico Substrate, Thermo Scienti c) was added into each well and incubated for 1 minute at room temperature. The chemiluminescence signal was read using plate reader (Varioskan LUX Multimode Reader). For readings at OD450, 100 ul of colorimetric TMB substrate (1-Step™ Ultra TMB-ELISA Substrate Solution, ThermoFisher) was added into each well and incubated for 15 minutes at room temperature. The reaction was stopped using 2 M sulfuric acid.
For full monobody-based ELISA, NC2 was used to coat the plate at 640 ng/well. NN1 with C-terminal FLAG tag was used at 1 ng/ml, as primary antibody. Then DYK rabbit polyclonal antibody was used to detect FLAG tag, followed by the addition of secondary antibody (anti-rabbit IgG-HRP, Cell Signaling Technology).
The use of human serum specimens for testing was approved by the Institutional Review Board of The University of Hong Kong / Hospital Authority Hong Kong West Cluster. HRP conjugation of monobody NC2 monobody with three Cystines on the C terminus was used for HRP conjugation. 100 µL of Maleimide Conjugation Buffer was added to 2-MEA, and incubated for 90 minutes at 37°C. Activated HRP was conjugated to NC2 monobody by incubating for 2 hours at room temperature.
Single-walled carbon nanotube nanobiosensor detection Our carbon nanotube devices were prepared as follows. First, carbon nanotubes were grown on a Si/SiO2 substrate by chemical vapor deposition (CVD) using iron nanoparticles as catalysts. Metal electrodes were then de ned by depositing Cr/Au through a shadow mask. Nanotubes located outside the channel were removed by an oxygen plasma treatment while protecting nanotubes inside the channel with polymethyl methacrylate. The surface of nanotubes was functionalized with linker molecules by soaking the devices in a 0.1 mM solution of 1-pyrenebutanoic acid succinimide ester in dry methoanol for 1 h, followed by an extensive washing with methanol. This linker molecule is a bifunctional molecule that has a pyrene derivative at one terminus and a succinimide ester group at the other terminus. The pyrene group is known to bind to the nanotube surface by p-p interaction, and the succinimide ester group is reactive towards nucleophilic groups such as amines. The device was then submerged in a 0.01 mM aqueous solution of BMPA for 2 hr, which resulted in adding maleimide terminals on the nanotube surface. NN2 was subcloned into pAO9 with C-terminal cysteine and puri ed as described above. NN2-Cys was immobilized on the nanotube over-night in PBS. After extensive washing, a baseline was established using increasing concentrations of BSA in 0.01% PBS. N protein was prepared as described previously, and buffer exchanged into 0.01% PBS (PD-10, GE Healthcare) before application to the nanobiosensor.
Immuno uorescence assay (IFA) Vero E6 cells were used for infection. Cells were xed with 4% PFA/PBS for 5 minutes at room temperature 24 hours post infection. After xation, cells were permeabilized by 0.3% Triton X-100/PBS for 10 minutes and blocked by 5% BSA/TBST (20 mM Tris 2.0, 140 mM NaCl, 0.05% Tween-20) for 1 hour. MBP-NN2 or MBP-NC2 was diluted in 1% BSA/TBST to 2 ug/ml and cells were incubated with the binder solution for 1 hour. Then, anti-MBP monoclonal antibody (NEB, 1:2500), anti-SARS N rabbit polyclonal antibody (1:2500) and anti-MERS N guinea pig antibody (1:1000) were added to the slide chamber and incubated for 1 hour. Anti-mouse Alexa Fluor-594 and anti-rabbit Alexa Fluor-488 (or anti-guinea pig Alexa Fluor-647) were diluted to 2 ug/ml to label the corresponding primary antibodies. Nuclei were stained by Hoechst 33324 and slides were xed again by 1% PFA/PBS for 5 minutes. Finally, slides were mounted by ProLong™ Diamond Antifade Mountant (Thermo Fisher) and cured for 24 hours at room temperate before image acquisition.

Immunoprecipitation (IP)
Immunoprecipitation experiments were performed with HIS-, Strep-and FLAG-tagged proteins expressed in 293T cells. Brie y, cells were transfected with the corresponding expression plasmids with Lipofectamine 2000 reagents (Invitrogen), and lysed at 2 days post-transfection with RIPA buffer (50 mM Tris-HCl pH 7.4, 0.5% NP-40, 150 mM NaCl, 1 mM EDTA and protease inhibitors). Cell lysates were incubated with Strep-Tactin XT beads (IBA) or anti-M2 FLAG beads (Sigma-Aldrich) overnight at 4 C with constant agitation, washed with RIPA buffer 5 times and eluted with 60 µl of SDS-PAGE sample buffer. All samples were subjected to SDS-PAGE and western blotting analysis. For sequential IP assay, protein complexes were eluted with 3XFLAG peptide (Sigma), then subjected to the second immunoprecipitation with Strep-Tactin XT beads, and nally eluted with 60 µl SDS-PAGE sample buffer for western blotting.
RNA extraction, reverse transcription and real-time PCR Viral RNA was extracted using the QIAamp Viral RNA Mini Kit (Qiagen Sciences), and reverse transcribed by Superscript III Reverse Transcriptase (Thermo Fisher) Quantitative real-time PCR was performed using Taq polymerase and SYBG. For viral copy number, a standard curve ranging from 1x103 to 1x108 was used for quanti cation.

Construction of doped library
The NN1 and NC1 doped libraries were constructed similar to the naïve library previously described . For the NC1 library, Fnoligo3 was replaced with (5'-CCAGCCTCCTGATCAGCTGG78S55758686685755768SCGCTACTACCGCATCACCTACG) and Fnoligo7 was replaced with . For the NN1 library, Fnoligo3 was replaced with (5'-CCAGCCTCCTGATCAGCTGG77S67S557857857557587CGCTACTACCGCATCACCTACG) and Fnoligo7 was replaced with (5'-CGGTAGTTGATGGAGATCGGS76775S56S65685758S76775786S68CGTGACGGCGTACACCGTGA).

Construction of variant library for the 3rd generation binders
Based on the parental sequences (NN2 and NC2), we introduced all possible random mutations to each two adjacent codons, using synthesized oligos containing NNSNNS sequence. N stands for a uniform mixture of A/T/C/G and S stands for G or C. We used 8 oligos to cover the full length of 10Fn3, and assembled them by PCR. Each library was expected to contain 56.6 million nucleotide variants, i.e. 10.5 million amino acid variants. mRNA display and sequencing 10Fn3 library (DNA templates) was transcribed by T7 polymerase (NEB), and ligated to the pF30P linker (Phospho-AAAAAAAAAAAAAAAAAAAAA-spacer9-spacer9-spacer9-ACC-puromycin) via the splint oligonucleotide by T4 DNA ligase (NEB). After puri cation and isolation of ligated mRNA templates, in vitro translation was performed using reticulocyte lysate (Promega) followed by incubation with KCl (500 mM) and MgCl2 (60 mM) for 30 minutes at room temperature to enhance fusion formation. The mRNAprotein fusions were then a nity-puri ed using M2 anti-Flag beads (Sigma-Aldrich). After elution with 3XFLAG peptides, the fusions were reverse transcribed with Superscript III (Invitrogen) and a fraction of the puri ed sample was reserved to determine the frequencies of each coding sequence in the input library. The puri ed fusion sample was incubated with bait protein for 2 hours at room temperature. After extensive washing (6 rounds of washing at room temperature), the immobilized fusion samples were eluted by heat (95°C) and PCR ampli ed. The ampli ed DNA fragments from input and post selection were then prepared for high throughput sequencing using NovaSeq 6000 (PE250).

Sequencing data analysis
Data were analyzed by customized bash and python scripts. Paired-end fastq reads were de-multiplexed into corresponding samples. Reads were mapped to corresponding parental monobody sequences. We reached ~50 million depth for the input libraries. The distribution of read depth centered around 10 reads, indicating that most variants in the library were su ciently covered. Variants that have read depth more than 5 in the input library were considered as candidates of high con dence, and included in downstream analysis. The relative enrichment (RE) score for each variant was calculated as its relative frequency in the selected library to that in the input library, normalizing to the relative frequency of its corresponding parental sequence.

Deep learning
Arti cial neural network model was constructed using TensorFlow 2.1.0. The 4-layer network was compiled by a 357-node input layer, a 64-node dense layer, a 64-node dense layer and a 10 nodes output layer. All input layers and middle layers used leaky-ReLu as activation function. The model used Adam optimizer with a learning rate of 0.01. Loss function MSE was used to optimize accuracy. We parsed each mutant sequence into a 357-dimension Boolean matrix as input. We then categorized the RE scores into 10 quantiles as output array. We fed 90% RNA display data into the network to train the model. We trained the model for each library and selection conditions. The accuracy of the models was 0.63-0.74 for N selected libraries. We then used the model to predict the rest 10% data. The predicted RE scores were calculated as the weighted average of the output array.

Data Availability
The sequencing data were deposited to NIH Short Read Archive (SRA) with access numbers PRJNA615649. Custom python scripts for further data analysis were deposited to github: https://github.com/Tian-hao/mRNAdisplay. maturation of NN1 and NC1 through mRNA display. FC and BG loops were randomized using a doped strategy, and selected against puri ed SARS N. Six rounds of selection with an increased binding temperature were performed. The last two pools of each selection were cloned and sequenced. The monobodies with increased binding a nity were identi ed and con rmed. (C) : Amino acid sequences of the two con rmed monobodies with improved a nity are shown (NN2 and NC2). The sequences were compared with the parental sequences (NN1 and NC1). Yellow shades indicate the residues within the BC and FG loops, and red shades mark the different residues. (D) : SPR analysis of binding a nity of NN2 and NC2 to SARS-CoV-2 N.    Detection of N in viral culture supernatant using ELISA with SARS-CoV-2, MERS or IAV (strain: A/Cal/04/09). Viral culture was serially diluted 1:3 in viral medium/0.5% TritonX-100 (N=3). (F&G) Dual monobody-based ELISA for N detection. MBP-tagged NC2 was coated onto ELISA plates for capturing N.
Error bars indicate SD for all panels. Detection sensitivity was calculated as the concentration of N that reached statistically different signal comparing to BSA control. Error bars indicate SD. Negative control with no addition of N is showed with -5 on the curve. Developing and optimizing monobodies against SARS-CoV2-N Diagram shows the developed platform for identifying, validating and evolving monobodies against SARS-CoV-2 N for potential diagnostic use.