Ultrafast Screening of COVID-19 by Machine Learning Analysis of Exhaled NO

A new coronavirus, SARS-CoV-2, has caused the coronavirus disease-2019 (COVID-19) epidemic. Current diagnostic methods, including nucleic acid detection, antibody detection, antigen detection and chest computed tomography (CT) imaging, usually take hours, and identication of the disease costs hundreds of dollars. Therefore, an ultrafast and economical detection method is urgently required to control the epidemic spread. Here, we report a rapid and low-cost method for rapidly preliminary screening COVID-19 suspects from healthy people. We established a machine learning (ML) model based on the fractional exhaled nitric oxide (FeNO) concentration, age, sex and body size of 34 COVID-19 patients and 70 healthy subjects. Then, the model was applied to 45 independent subjects, including 12 mild and asymptomatic COVID-19 patients, 10 patients with other diseases, and 23 healthy subjects. The patients with diseases affecting the FeNO including COVID-19, asthma, hypertension and etc were screened out as suspects with the rate of 94.1%. Only one healthy subject was misclassied. This noninvasive and comfortable detection procedure takes in two minutes and costs less than a dollar, which simultaneously improves the detection eciency and reduces expenses by multiple orders of magnitude. This work may provide a direction for the control of the rapid spread of COVID-19.

SARS-CoV-2 infection mainly depends on nucleic acid detection, antibody detection, antigen detection and chest computed tomography (CT) imaging 2,3 . The cost of these methods ranges from 11 dollars to 71 dollars, and test completion usually takes 15 minutes to 4 hours. Although these methods can effectively and accurately detect COVID-19 patients and help guide epidemic prevention and control, they are time-consuming and noneconomical for places with large population ow and dense populations. Therefore, the development of a test for rapid and accurate preliminary diagnosis that has a low requirement in terms of equipment and personnel, and is suitable for large-scale and effective screening of new coronavirus infections among suspected cases within areas of high-population ow is a major challenge. 4 Fractional exhaled nitric oxide (FeNO) has been considered an important biomarker for the detection and diagnosis of respiratory diseases [5][6][7] . In 1991, Gustafsson and coworkers found that humans produced FeNO at a magnitude of parts per billion (ppb), and there was evidence of endogenous NO synthesis in the lower respiratory tract 8 . After at least 28 years of exploration, it was determined that the detection of FeNO could accurately predict lung cancer, asthma, scleroderma, sarcoidosis and other lung diseases [5][6][7][9][10][11][12][13][14][15][16][17] . After the severe acute respiratory syndrome coronavirus (SARS-CoV) outbreak in 2002-2003, relevant studies showed that the NO concentration in the organism changes with the degree of infection by a coronavirus 18,19 . For example, in the early stage of infection with SARS-CoV, activation of epithelial cells, such as those in alveoli, to produce cytokines results in the upregulation of inducible NO synthase, further increasing the NO concentration and thus playing an antiviral role in controlling SARS-CoV infection 20 . At the same time, excessive NO is converted into peroxynitrite and other substances, damaging human immune cells, promoting SARS-CoV infection, and inhibiting NO production, resulting in a reduction in the NO concentration. Importantly, the NO concentration in SARS patients is signi cantly different from that in healthy controls. Considering that SARS-CoV-2 and SARS-CoV disease have similarities 21 , we infer that the FeNO concentration plays a crucial role in the preliminary diagnosis of COVID-19. To analyse the correlation between FeNO and COVID-19, we used the machine learning (ML) method, which has been widely applied in scienti c research. [22][23][24] During the COVID-19 epidemic, we measured the FeNO concentrations of 103 subjects without COVID-19 and 46 patients with COVID-19 and set up a database. We established an ML model based on FeNO, age, sex and body size for COVID-19 screening. This ML model had an area under the curve (AUC) of 0.989 for the test set and the screening-out rate of suspected patients was 94.1%. It resolved the issue regarding the discomfort caused by traditional nucleic acid detection sampling and could bring about an increase in detection e ciency and a reduction in detection cost by orders of magnitude. The false negative rate could be kept at a low level. Thus, this test has the potential to be used as a preliminary screening tool for the widespread detection of COVID-19.

Methodology And Results
Our classi cation machine learning framework is schematically illustrated in Figure 1. The approach consists of three parts: feature engineering, ML model training, and the output COVID-19 status probability. First, feature engineering is needed to evaluate the feature importance of classifying COVID-19. The random forest method was used for feature engineering. As shown in Figure 2A, age, which has been reported to be highly related to COVID-19 in the literature, plays an important role in the COVID-19 classi cation 25 . The most important information in Figure 2A is that FeNO is con rmed to play an essential role in supporting our speculation. FeNO showed a signi cant statistic difference between COVID-19 patients and healthy subjects, as shown in Figure 2B. Sex seems to have little effect on the COVID-19 classi cation. Pearson correlation coe cient matrices are calculated to identify the positive and negative correlations between pairs of features ( Figure 2C). Body mass index (BMI) and body area are highly related to weight and height. At the same time, age and FeNO show low linear correlations with other features. The low linear correlations for age and FeNO features indicate that it would better keep them in the ML model construction process. The redundant feature should be removed, which will improve the performance of the ML model. It is noteworthy that FeNO value alone could not classify COVID-19 patients and healthy subjects because of the overlaps of the statistical distribution, as shown in Figure 2B. Therefore, we consider introducing more features for ML model training. We randomly split the subjects into two sets: the training set and the test set ( Figure 1). The test set comprised 10 non-COVID-19 subjects with asthma, gastric disorders, pharyngitis and hypertension, 23 healthy subjects and 12 COVID-19 patients. The 23 healthy subjects and 12 COVID-19 patients in the test set were selected according to a 2:1 ratio of healthy to COVID-19 subjects, and they made up 25% per cent of the total number of healthy and COVID-19 subjects. The training set contained 70 healthy subjects and 34 COVID-19 patients. 10-fold cross-validation (CV) was performed to improve our prediction of the "out-of-sample" error. We found that the best ML model came from the feature combination of 'Age', 'BMI', 'Body Area' and 'FeNO' ( antihypertensive drugs, such as nitroglycerin (GTN) and sodium nitroprusside (SNP) will increase the FeNO according to prior research. 27 Gastr-1 has a gastric disorder. Stomach disease can affect the FeNO according to prior researches. 28 Subjects with pharyngitis were all classi ed as healthy non-COVID-19 subjects. To our knowledge, there is no research showing that pharyngitis is related to FeNO. According to the prediction results, this ML model will screen out the patients who are suffering diseases that affect the FeNO values included COVID-19, hypertension, asthma and gastric disorder. If we treat the classi cation results of asthma, hypertension, gastric disorder and pharyngitis are correct, the AUC of the ML model is 0.989 and the screening-out rate of suspected patients was 94.1%. It is noteworthy that although this machine learning model will screen out people with other diseases affecting the FeNO value, these people account for a very small part of the non-COVID-19 people, so it is reasonable to use this model as a primary screening for COVID-19 suspects.

Discussion
Our proposed model dose has some limitations. These limitations were mainly due to biological safety restrictions that limited the collection of a large number of clinical specimens in the study during the SARS-CoV-2 outbreak. As shown in gure 3D, people suffering from diseases such as asthma, gastric disorder and hypertension that affect FeNO will be classi ed as COVID-19. In other words, the suspected patients screened are likely to include patients with other diseases that affect FeNO values. Considering the COVID-19 has gone global with cases in over 150 countries, our model may provide a direction for the preliminary screen of this disease. The patients in this study all had mild symptoms or were asymptomatic. For severe cases, FeNO varies with the infection level, and the model should be different; however, assessing FeNO in severe cases was not our purpose. We mainly focused on early detection to screen for COVID-19 cases that were asymptomatic or had mild symptoms. If the model is to be applied in a large-scale clinical screening, more data should be collected to improve our ML model. In fact, two additional parameters should be added to the model, namely, whether the subject had contact with other COVID-19 patients and whether the subject comes from an epidemic zone. An upgraded model, including these two parameters, will improve the detection accuracy further. However, such a model requires extensive data collection, which is di cult to carry out in China at this time.
The size of the current FeNO analyser is 275 mm × 210 mm × 88 mm, and its power supply voltage and frequency are AC100-240 V and 50-60 Hz, respectively. We found that there is a portable FeNO detection device (Bedfont NO breath) in the United Kingdom that cannot be obtained because of the COVID-19 epidemic. With the continuous miniaturization and intelligentization of gas detection equipment, it will be easy to establish a personal detection system by combining portable terminal devices, such as smartphones, making it is convenient for people to check their health status every day and seek health care at the earliest stage of COVID-19, which can be cured very easily. On the other hand, this system is simple to use and does not require users to have a long-term systematic medical training, which enables the promotion of this detection technology in the community. After obtaining the consent of the subjects who are being tested, a big-data analysis of speci c exhaled breath components of the crowd can be performed, which would promote the application of characteristic gas detection in response to other respiratory infection outbreaks.
The procedure to screen COVID-19 in our method is as follows. First, connect the data on FeNO, age, sex, body size and anamnesis (tuberculosis, chronic obstructive pulmonary disease (COPD), lung cancer, allergic rhinitis, pharyngitis, heart disease, diabetes, and stomach disease) of each subject. The FeNO value of the subjects can be tested by blowing into the breath collection bag connected to the detector (the FeNO detect process are shown in Supplementary Video). Second, the ML model is used to calculate the probability of having COVID-19 for each subject. Subjects with an output probability greater than 0.5 are best admitted to the hospital for further nucleic acid testing, and those with a probability of less than 0.5 are classi ed as a low-risk subject. The whole test time of this method is less than 2 minutes. The consumables include a plastic lter and a breath collection bag, and the cost is about 0.3 US dollars. For comparison, the cost of nucleic acid testing consumables is about 10 US dollars. For example, the nucleic acid detections for 10 million people cost in US $100 million, while the cost of our detection method for preliminary screening is only 3 million US dollars. This detection simultaneously improves the detection e ciency and reduces expenses by multiple orders of magnitude. On the whole, once our method can be used in clinical application, it will signi cantly save the cost and time of screening, and may effectively control the spread of COVID-19.

Conclusion
In conclusion, these results demonstrate the role of a highly accurate ML model for the rapid screening of potential COVID-19 patients based on FeNO detection. We believe that the proposed ML model, which combines FeNO detection, could be a useful screening tool that can be used to discover infectious diseases such as COVID-19 in time and that does not require more complicated technologies. This method has the potential to address similar respiratory infection outbreaks in the future.

Methods
Data Collection and Processing. We collected data on age, sex and anamnesis (tuberculosis, COPD, lung cancer, allergic rhinitis, pharyngitis, heart disease, diabetes, and stomach disease). This analysis was conducted on a control sample of 149 subjects in the Hubei and Shandong provinces of China from March 2020 to June 2020. Among the subjects, 109 were non-COVID-19 subjects, and 46 were COVID-19 patients from LiYuan Hospital a liated with TongJi Medical College HuaZhong University of Science and Technology, Wuhan. Among the non-COVID-19 subjects, 93 subjects were healthy without other diseases and these subjects were designated Hea-(1, 2…), 2 subjects had hypertension and were designated Hyp-(1, 2), 1 subject had gastric disorder and was designated Gastr-1, 5 subjects had pharyngitis and were designated Phar-(1, 2…), and 2 subjects had asthma and were designated Asth-(1, 2). The 46 COVID-19 patients had no marked distinctions and were designated Cov-(1, 2…). Fractional exhaled NO samples and data on bodily characteristics were obtained from each subject. All volunteers were informed of the content of the test, and informed consent was obtained from each subject. More detailed information for each subject can be found in the Supporting Information. The demographic characteristics of all tested patients and controls in the current study are reviewed in Supplementary Table 1. Detailed test information on breath collection and analysis is also provided in the Experimental Methods.
FeNO level measurement. The FeNO level was measured using a Nano Coulomb Breath Analyser obtained from Wuxi Sunvou Medical Electronic Co., Ltd, China. The Sunvou Nano Coulomb Breath Analyser can measure the NO concentration in the large and small airways through different ow rates. It is mainly based on and fully conforms to the technical standards updated by the American Thoracic Society (ATS) and European Respiratory Society (ERS) in 2005 and ERS in 2017. According to this standard, the determination of NO in the exhaled breath includes the parameters eNO and nNO, where eNO = bronchus NO/ ow F + alveolus NO + correction term. To ensure the accuracy of test results and meet the requirements for the use of the instruments, one hour before sampling, the subjects were not allowed to eat, smoke, exercise, or perform lung function tests. Besides, three hours before sampling, the subjects were not allowed to eat broccoli, kale, lettuce, celery, radish, or smoked or pickled food. For detailed information, see www.sunvou.com. ML model. We developed random forest classi ers based on the FeNO, age, sex, height, weight, BMI, and body area of each subject. We ne-tuned the hyperparameters of the random forest classi er on the training and validation sets and evaluated the model on the test set. Ten-fold cross-validation was used in this study. The number of estimators and "class_weight" was tuned. The ML model was selected because of the highest AUC score on the validation set. We used the Scikit-learn 29 package to train and evaluate these models. Declarations