An articial intelligence nanopore platform for SARS-CoV-2 virus detection

High-throughput, high-accuracy detection of emerging viruses allows for pandemic prevention and control. Currently, reverse transcription-polymerase chain reaction (RT-PCR) is used to diagnose the presence of SARS-CoV-2. The principle of the test is to detect RNA in the virus using a pair of primers that specically binds to the base sequence of the viral RNA. However, RT-PCR is a sophisticated technique requiring a time-consuming pretreatment procedure for extracting viral RNA from clinical specimens and to obtain high sensitivity. Here, we report a method for detecting novel coronaviruses with high sensitivity using articial intelligent nanopores utilizing a simple procedure that does not require RNA extraction. Articial intelligent nanopore platform consists of machine learning software on the servers, portable high-speed and high-precision current measuring instrument, and scalable, cost-effective semiconducting nanopore modules. Here we show that the articial intelligent nanopores are successful in accurate identication of four types of coronaviruses, HCoV-229E, SARS-CoV, MERS-CoV, and SARS-CoV-2, which are usually extremely dicult to detect. The positive/negative diagnostics of the new coronavirus is achieved with a sensitivity of 95 % and specicity of 92 % with a 5-minute diagnosis. The platform enables high throughput diagnostics with low false negatives for the novel coronavirus.


Introduction
Human coronavirus, HCoV-229E is one of the rst coronavirus strains reported to be associated with nasal colds 1 . In the past 20 years, other species of coronaviruses, namely, Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS) have caused a pandemic in the form of severe respiratory illness 2 . Recently, SARS-CoV-2, the seventh species of coronavirus, has spread all over the world, causing the outbreak of an acute respiratory disease [3][4][5][6][7][8][9][10] . In the absence of vaccines and effective treatments, testing and quarantine are needed to control the transmission of the virus. Currently, RT-PCR is the gold standard for SARS-CoV-2 testing which is based on the principle that the singlestranded RNA present in this virus and the primer form a double helix. However, the RT-PCR method is prone to false negative determination which increases the risk of viral infection. Other methods for minimizing false negatives are needed for the initial diagnosis and judgment of recovery 11 . Further, prior to the genetic test, the process of extraction and puri cation of viral RNA is time-consuming. The exposure also increases the risk of the inspector contracting the virus. Therefore, there is a need for an inspection method with higher sensitivity and throughput 11 .
Nanopores have through holes with diameter ranging from several nanometers to several hundreds of nanometers on the substrate [12][13][14][15][16][17] . Low-aspect solid-state nanopores with nanopore thickness/ diameter <1 are used for the detection of DNA, proteins, viruses and bacteria 18 . When the virus is transported from the cis side to the trans side by electrophoretic force, the ionic current decreases (Figs. 1a, 1b). The ionic current versus time waveform obtained from the nanopore has information on the volume, structure, and surface charge of the target 18 being analyzed. At the laboratory level, it has been demonstrated that by classifying the waveform data using arti cial intelligence, a single virus can be directly identi ed with high accuracy which does not require the extraction of the genome 18 . However, in order to obtain su cient amount of virus learning data in clinical specimens and achieve highly precise detection with increased reproducibility, the manufacturing of nanopore devices with improved accuracy and better yield is a critical constraint. On the other hand, the ionic current obtained from the nanopore is very feeble, in the order of several tens of pA. The current characteristics obtained from the nanopore largely depend, not only on the electrical characteristics of the nanopore, but also on the electrical characteristics of the measuring device and the uidic device that transports the specimen to the nanopore 17 . Therefore, the development of a dedicated measuring device and ow channel suitable for the nanopores has also been a major limitation in realizing a highly accurate diagnostic system. In the current study, we have developed an arti cial intelligence assisted nanopore based device to accurately detect the viruses.

Nanopore platform for viruses
We have developed a nanopore module (25 mm × 25 mm × 5 mm) in which a nanopore chip and a plastic channel were fused (Fig. 1c). Both sides of the silicon chip were chemically bonded to the plastic channel. The hydrophilic channels on the front surface (blue) and the back surface (red) revealed a crossbar structure and passed through the nanopores. From the specimen inlet with a diameter of 1 mm, 15 ml of buffer solution and specimen were pipetted into the front and back channels, respectively. Ag/AgCl electrodes were fabricated on the polymer substrate in each channel for stable current measurement with high reproducibility for a longer duration. A silicon chip (5 mm × 5 mm × 0.5 mm) on which a 50 nm thick SiN was deposited had nanopores about 300 nm in diameter comparable with the diameters of coronaviruses of about 80-120 nm (Figs. 1d, 1e) 1,19 . Silicon chips were manufactured in units of 12 inch wafers by using microfabrication technology, and were cut into chips by dicing. Through this mass production process, nanopores were produced with high accuracy (diameter error ±10 nm) and high yield (90%). The diameter of the nanopore was exible and was modi ed according to the size of the virus to be measured.
Learning and diagnostic protocols The developed nanopore system does not require the extraction of the RNA from virus both during the learning and inspection phases. All the specimens have been measured by employing a simple procedure developed in-house using portable nanoSCOUTER TM . Owing to the possibility of varied properties between the cultured virus and the clinical specimen containing the virus, machine learning is attained by using samples from both these sources while building a new virus testing device.
A classi er created by learning the current-time waveforms of each target cultured virus (Fig. 2a) is utilized to model the behavior of the cultured virus. During the learning process, a buffer solution is placed in the trans channel of the nanopore module, a specimen of 15 ml diluted with the buffer solution is placed in the cis channel, and the module is placed in a dedicated cartridge. The cartridge is placed in a measuring instrument and the current-time waveform is measured for 0.1 V or -0.1 V applied voltage. About 100-1000 waveforms have been required to learn one virus in a duration of less than a day. It took a few days to obtain a clear learning on the cultured virus.
While learning the clinical specimens, using nasopharyngeal swabs and saliva, each PCRpositive/negative clinical specimens of the novel coronavirus have been trained in the same manner as in the case of cultured viruses, and a classi er has been built ( Figure 2a). About 100-1000 waveforms is required to accomplish learning. During the diagnostic process, it is observed that the prepared classi er is successful in determining the positive/negative nature of the clinical specimen in a duration of 5 minutes (Figure 2b).

Signal processing and AI
All the data have been transferred to a cloud server for the analysis of the ionic current-time waveform by building machine learning models. The signal processing software that we have developed could extract only the waveform data from the raw data. The current vector (I 1 , I 2 , , I 10 ) is obtained by dividing the waveform data into 10-folds along the time direction. The values of the maximum current (I p ) and current duration (t d ) ( Figure 1b) are utilized, and a 10-fold cross validation using the current vector is conducted.
Subsequently, the waveforms of the individual viruses are learned and the classi cation models are created 20 . By performing cross-validation, the feature quantity and the classi er that gave the highest discrimination accuracy are selected as the optimum classi er. In clinical specimens that are enriched in contaminants, the positive specimens composed of the target virus and contaminants while the negative specimen contained only the contaminants. In such cases, the positive unlabeled classi cation (PUC) method 21 is used to remove the waveform of contaminants 20 . The decision on positive/negative is made by a method of assembling the accuracy of the obtained waveform.
The performance of the developed measurement system is evaluated using polystyrene nanoparticles of nearly uniform shape and average diameters of 200 nm and 210 nm which are diluted with 1 x phosphate buffered saline (PBS). The discrimination accuracy between the 200 nm and 210 nm nanoparticles using an ion current-time waveform is ≥ 99.9 %. This indicates that a single ionic currenttime waveform is su cient to precisely differentiate the nanoparticles. The classi er has been built using the measurement data obtained from the ve modules which is unaffected by the manufacturing variations of the nanopores and the measurement environment. Therefore, using a single ionic currenttime waveform, 200-nm nanoparticles and 210 nm nanoparticles are identi ed with a precision of 97.3 %. This accuracy indicates that if two waveforms are acquired, a classi cation accuracy of ≥99.9 % can be achieved.

Discriminating cultivated coronaviruses
When HCoV-229E viral sample of 100 pfu/ml is measured at -0.1 V, the ionic current-time waveforms are obtained at a frequency of 14.2 pulses/min (Figs. 3a and 3b). The culture solution of HCoV-229E is DMEM placed in the trans channel. When measured at 0.1 V, one waveform is obtained in 15 minutes. To con rm that the waveform obtained at -0.1 V is due to the passage of the virus through the nanopore, RT-PCR measurement is performed on the solution in the trans channel. When the solution is extracted from the trans channel, it is expected that the virus will be adsorbed on the wall surface of the channel and the number of viruses which are extracted reduces. Hence, RT-PCR measurement is performed after acquiring about 1,200 waveforms. The test results showed that the viral presence when passed through the nanopore at -0.1 V. However, in the RT-PCR performed in the trans channel that is left standing for 6 hours without applying a voltage, absence of virus is noted. This result has demonstrated that the e cient viral transmission through the nanopore is achieved by electrophoresis instead of diffusional movement.
To investigate the detection limit of the nanopore platform, the ionic current-time waveform is measured at -0.1 V by varying the HCoV-229E concentration (Figure 3b). The threshold of the viral concentration is set at 250 pfu/ml due to the di culty in culturing it at high concentrations. When the number of waveforms obtained by measuring for 15 minutes is examined, an average of 3 waveforms could be obtained at 2.5 pfu/ml. However, for the same duration with a concentration of 0.25 pfu/ml, no waveform is obtained. It is therefore concluded that the detection limit of coronavirus in DMEM is 2.5 pfu/ml when 0.1 V is applied for 15 minutes using a nanopore with a diameter of 300 nm. By applying 0.1 V, a classi er for PCR positive/negative specimens is created according to the protocol shown in Figure 2. Ionic current-time waveforms are also detected in the PCR-negative specimens which recorded a F-value of 0.69. When this value is >0.50, it is possible to distinguish between the positive and negative specimens with one waveform. When using the created classi er, a 87 % sensitivity and 73 % speci city is obtained when the measurement time for one specimen is 1 minute (Fig. 4). The sensitivity and speci city improved to 87 % and 85 % when the measurement time is increased to 5 minutes owing to the increase in the number of generated waveforms.
Diagnosis by the collection of saliva is a type of test that does not burden the patients. Since saliva samples have more contaminants than nasopharyngeal swabs, all the PCR-positive and PCR-negative saliva samples diluted 1: 1 with 1 x PBS have been ltered through a 0.45 mm membrane lter. 20 positive and 24 negative specimens that were stored under refrigerated condition have been used. Machine learning studies is performed using the same procedure and conditions as adopted during the nasopharyngeal swab test. The accuracy of discrimination between PCR positive and PCR negative by one ion current-time waveform is obtained as F = 0.67. Sensitivity and speci city showed an increase with increase in the time of measurement. After 5 minutes of the measurement, a sensitivity of 95% and speci city of 92% is recorded (Figure 4b). A classi er with 95% sensitivity gives a false negative rate of 5%, enabling its use for screening tests that diagnose a large number of people with high throughput.

Conclusions
The arti cial intelligent nanopore was successful in the detection of virus with high accuracy. By changing the training data from cultured viruses to PCR-positive/-negative specimens, the arti cial intelligent nanopore platform becomes a device capable of detecting both positive and negative specimen with high sensitivity at high throughput. By modifying the training data, the platform is a versatile virus diagnostic system. For instance, infections caused by in uenza A virus, which usually spreads between autumn-winter every year, will show the similar symptoms as caused by SARS-CoV-2 [23][24][25] . When a person infected by the new coronavirus, based on the u-like symptoms approaches for medical aid, the risk of infection to medical staffs and the spreads the infection increases if not diagnosed accurately. According to this study, machine learning of the cultured SARS-CoV-2 and in uenza A virus (H1N1) showed an extremely high discriminator with an F-value of 0.90 (Supplementary Figure 5). When the clinical specimens collected from patients infected with each virus are used as training data, the identi cation result of the cultured viruses will enable the development of a device that can diagnose both viruses with high accuracy.

Methods
Preparation of cultured viruses. African green monkey kidney Vero cells and human cervix adenocarcinoma HeLa cells were maintained and grown in Dulbecco's modi ed Eagle's medium (DMEM; Nacalai Tesque, Kyoto, Japan) containing 5 % fetal bovine serum (FBS; Thermo Fisher Scienti c, MA, USA). SARS-CoV Frankfurt strain, MERS-CoV/EMC2012 strain and SARS-CoV-2/Hu/DP/Kng/19-020 strain (GenBank Accession number LC528232) were propagated using Vero cells. HCoV-229E (ATCC VR740) was propagated in HeLa cells. Viral stocks of these coronaviruses were ltered using Millex-HV Syringe Filter Unit 0.45 μm (Merck, Darmstadt, Germany) and the ltrates were used for diagnostic method using nanopore and machine learning. Viral titers were calculated by the modi ed 50 % tissue culture infectious dose (TCID 50 ) assay and plaque assay 26, 27 . The genome copy number of HCoV 229E was determined by quantitative PCR using Taqman probe. Infuenza A (H1N1pdm09, California/7/2009) virus was added to MDCK cells in DMEM, and after incubation for 6 hours, trypsin was added at a nal concentration of 2.5 µg /ml. The infected cells were incubated until a cytopathic effect was observed. The medium of the infected cells was centrifuged at 440 × g for 5 minutes, the supernatant of medium was collected, and ltered with a lter having a pore size of 0.45 µm (Millex-HV; Millipore Co.). All the experiments using in uenza virus were approved by the institutional biosafety committee, and was precautiously carried out in BSL-2 facilities.
Clinical specimens. Specimens were acquired as residual samples from clinical examination with the approval of Institutional Review Board, Osaka University Hospital, Osaka University, Japan. Nasopharyngeal swab and saliva samples were examined by SARS-CoV-2 Direct Detection RT-qPCR Kit (Takara Bio, Otsu, Shiga, Japan) using Roche LightCycler 96 for the detection of SARS-CoV-2. In SARS-CoV-2 positive samples, viral copy numbers were calculated based on the C t value of RT-qPCR.
Arti cial intelligence nanopore platform. A nanopore module, nanopore measuring instrument, and Aipore-ONE TM , arti cial intelligence software developed and implemented by Aipore Inc. were used. HCoV-229E was measured at the BSL-2 facility. SARS-CoV, SARS-CoV-2 and MERS-CoV were measured at the BSL-3 facility at the Research Institute for Microbial Diseases, Osaka University. The experiments were con rmed by the Institutional Biosafety Committee. The nanoSCOUTER™ was used for all measurements.

Declarations Data availability
The data that supports the ndings of this study are available from the corresponding authors upon reasonable request.