Ultrafast Preliminary Screening of COVID-19 by Machine Learning Analysis of Exhaled NO


 Background A new coronavirus, SARS-CoV-2, has caused the coronavirus disease-2019 (COVID-19) epidemic. Current diagnostic methods mainly include nucleic acid detection, antibody detection, antigen detection, and chest computed tomography (CT) imaging. Although these methods are crucial for the diagnosis of COVID-19, there is a lack of a rapid and economical method for preliminary screening COVID-19.Methods We measured the FeNO concentrations of 103 subjects without COVID-19 and 46 patients with COVID-19. Using machine learning (ML) method, we build a ML model based on fractional exhaled nitric oxide (FeNO) concentration and features of age, and body size for rapid preliminary screening COVID-19 suspects with low-cost.Findings The statistical analysis t-test show that there is a significant difference between the FeNO of healthy people and patients with COVID-19. The ML model can screen out the patients with COVID-19 or other diseases, which show abnormal FeNO distributions. An area under the curve of 0.982 and a sensitivity 0.917 have been achieved for preliminary screening COVID-19 suspects. This non-invasive detection method which takes in two minutes and costs less than a dollar could provide a direction for the control of the rapid spread COVID-19.Interpretation During the COVID-19 pandemic, large numbers and extensive testing of COVID-19 patients remains a problem. Public healthy efforts to limit SARS-CoV-2 spread need to find a more economical and faster screening method.


Introduction
Transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative virus of the coronavirus disease-2019 (COVID-19) epidemic, can be effectively prevented by developing a rapid and large-scale screening method for the SARS-CoV-2-infected population [1][2][3] . Screening suspected patients for isolation and maintaining social distancing can effectively prevent the spread of the virus. [4][5] At present, the detection of SARS-CoV-2 infection mainly depends on nucleic acid detection, antibody detection, antigen detection, and chest computed tomography imaging 6,7 . In the control of the rapid spread of SARS-Cov-2, nucleic acid detection became a standard because of its accuracy and e ciency, which usually costs tens of dollars and takes at least 30 minutes for each case. However, for the screening of large populations such as more than ten million people, this method is too expensive and time-consuming. Therefore, the development of a rapid and accurate preliminary diagnosis for large population is a grand challenge. 8 Fractional exhaled nitric oxide (FeNO) has been considered an important biomarker for the detection and diagnosis of respiratory diseases 9 . In 1991, Gustafsson and coworkers found that humans produced FeNO at a magnitude of parts per billion (ppb), and there was evidence of endogenous NO synthesis in the lower respiratory tract 10 . After at least 28 years of exploration, it was determined that the detection of FeNO could predict lung cancer, asthma, scleroderma, sarcoidosis, and other lung diseases 9, 11-18 . After the severe acute respiratory syndrome coronavirus (SARS-CoV) outbreak in 2002-2003, relevant studies showed that the NO concentration in the organism changes with the degree of infection by a coronavirus 19 . For example, in the early stage of infection with SARS-CoV, activation of epithelial cells, such as those in alveoli, to produce cytokines results in the upregulation of inducible NO synthase, further increasing the NO concentration and thus playing an antiviral role in controlling SARS-CoV infection 20 . At the same time, excessive NO is converted into peroxynitrite and other substances, damaging human immune cells, promoting SARS-CoV infection, and inhibiting NO production, resulting in a reduction in the NO concentration. Importantly, the NO concentration in SARS patients is signi cantly different from that in healthy controls. Considering that SARS-CoV-2 and SARS-CoV disease have similarities 21 , we infer that the FeNO concentration in COVID-19 patients may be different from that in healthy controls.
In this work, we demonstrated the FeNO concentration is correlated to COVID-19. However, FeNO alone could not be used to identify COVID-19 patients because of the statistic distribution between FeNO in COVID-19 patients and healthy subjects overlaps. Therefore we developed an ML model by integrating FeNO with other features including age, body mass index (BMI) and body area, which had been reported to be related to FeNO values. This model successfully identi ed COVID-19 suspects with AUC of 0.982 and sensitivity of 0.917.

Clinical data
We collected data on age, sex and anamnesis (tuberculosis, chronic obstructive pulmonary disease, lung cancer, allergic rhinitis, pharyngitis, heart disease, diabetes, and stomach disease). This analysis was conducted on a control sample of 149 subjects in the Hubei and Shandong provinces of China from March 2020 to June 2020. Among the subjects, 109 were non-COVID-19 subjects, and 46 were COVID-19 patients from LiYuan Hospital a liated with TongJi Medical College HuaZhong University of Science and Technology, Wuhan. Among the non-COVID-19 subjects, 93 subjects were healthy without other diseases and these subjects were designated Hea-(1, 2…), 2 subjects had hypertension and were designated Hyp-(1, 2), 1 subject had gastric disorder and was designated Gastr-1, 5 subjects had pharyngitis and were designated Phar-(1, 2…), and 2 subjects had asthma and were designated Asth-(1, 2). The 46 COVID-19 patients had no marked distinctions and were designated Cov-(1, 2…). Fractional exhaled NO samples and data on bodily characteristics were obtained from each subject. All volunteers were informed of the content of the test, and informed consent was obtained from each subject. More detailed information for each subject can be found in the Supplementary Table S1-2. The demographic characteristics of all tested patients and controls in the current study are reviewed in Table 1.

Statistical analysis
As shown in Figure 1A, the FeNO value has a signi cant difference between healthy people and COVID-19 patients. The p value of the student's t test is 0.006. Though there is a signi cant statistic difference of FeNO between healthy subjects and COVID-19 patients, only analyzing the value of FeNO can't distinguish COVID-19 patients due to the overlaps of the statistical distribution, as shown in Figure 1B.
FeNO levels have been reported to be affected with many factors, including age, gender, and body size parameters. Therefore, we considered introducing other features and using the ML algorithm to meet the need to identify COVID-19 patients. ML method has been widely applied in scienti c research, and many scientist used it to help with controling COVID-19 outbreaks including diagnose and anti-SARS-CoV-2 activities prediction etc. [22][23] Machine learning model development Our classi cation machine learning framework is schematically illustrated in Figure 2. The approach consists of three parts: feature engineering, ML model training, and the output COVID-19 status probability. Feature engineering is rstly used to evaluate the feature correlation. Pearson correlation coe cient matrices are calculated to identify the positive and negative correlations between pairs of features (e- Figure 1). Body mass index (BMI) and body area are highly related to weight and height. At the same time, age and FeNO show low linear correlations with other features. The low linear correlations and high importance of age and FeNO features indicate that it would better keep them in the ML model construction process. The redundant feature should be removed, which will improve the performance of the ML model. We randomly split the subjects into two sets: the training set and the test set ( Figure  2). The training set contained 70 healthy subjects and 34 COVID-19 patients. The test set comprised 10 non-COVID-19 subjects with asthma, gastric disorders, pharyngitis and hypertension, 23 healthy subjects, and 12 COVID-19 patients. The 23 healthy subjects and 12 COVID-19 patients in the test set were selected according to a 2:1 ratio of healthy to COVID-19 subjects, and they made up 25% percent of the total number of healthy and COVID-19 subjects. Due to the small dataset, 10-fold cross-validation (CV) was performed to improve our prediction of the "out-of-sample" error. CV method is usually used in deal with small dataset in ML. [24][25] We found that the best ML model came from the feature combination of 'Age', 'BMI', 'Body Area', and 'FeNO' (Figure 3, e-Figures 2 and 3). Figure 3A shows the performance of the model in the training cohort, and Figure 3B shows the receiver operating characteristic (ROC) curve of the ML model in the training cohort. This model reached an AUC of 0.996 (95% CI 0.988, 1.000) in the training set. One COVID-19 patient (Cov-37) and one healthy subject (Hea-131) were incorrectly classi ed.
We then tested the model on the unseen 45 subjects of the test cohort. As shown in Figure 3C,  suspects. Five subjects with pharyngitis were all classi ed as healthy non-COVID-19 subjects. To our knowledge, there is no research showing that pharyngitis is related to FeNO. It is noteworthy that although this machine learning model will screen out people with other diseases affecting the FeNO value, these people account for a very small part of the non-COVID-19 people, so it is reasonable to use this model as a primary screening for COVID-19 suspects.

Discussion
The procedure to screen COVID-19 in our method is as follows. First, connect the data on FeNO, age, sex, body size and anamnesis of each subject. The FeNO detects process is shown in Supplementary Video. Second, the ML model is used to calculate the probability of having COVID-19 for each subject. Subjects with an output probability greater than 0.5 are best admitted to the hospital for further nucleic acid testing, and those with a probability of less than 0.5 are classi ed as low-risk subjects. The whole test time of this method is less than 2 minutes. The consumables include a plastic lter and a breath collection bag, and the cost is about 0.3 US dollars. For comparison, the cost of nucleic acid testing consumables is about 10 US dollars. For example, the nucleic acid detections for 10 million people cost in the US $100 million. In contrast, the cost of our detection method for preliminary screening is only 3 million US dollars. This detection simultaneously improves the detection e ciency and reduces expenses by multiple orders of magnitude.
Limitations and outlook.
The main limitation of the model is that the sample size is small. This limitation was mainly due to the wide regional distribution of the new cases of COVID-19 subjects impeded the data collection of the clinical specimens at an early stage. It is noteworthy that, we mainly focused on early screen for COVID-19 cases that were asymptomatic or had mild symptoms. Thus, all of the patients in this study are mild or asymptomatic cases. FeNO varies with the infection level, for severe cases, the model should be different; however, assessing FeNO in severe cases was not our purpose. On the other hand, if two additional parameters could be added to the model, namely, whether the subject had contact with a COVID-19 patient and whether the subject comes from an epidemic zone, it would improve the screen accuracy further. In the future, we will seek more cooperation to collect more samples. At now, we can only treat the small-datasets machine learning process as carefully as we can to achieve the highest possible accuracy. We believe that as more samples are collected, the accuracy of the model will be signi cantly improved, which will be helpful to the control of the epidemic.

Discussion
In conclusion, we developed a highly accurate method for the rapid screening of potential COVID-19 patients based on FeNO detection. We believe that the proposed ML model, which combines FeNO detection, could be a useful preliminary screening tool that can be used to discover the diseases with abnormal FeNO such as COVID-19 in time and that does not require more complicated technologies.   Illustration of the ML model framework. First, we measured the FeNO levels of each subject and obtained basic information, including age, height, weight, and anamnesis. Then, we trained the ML model on the training cohorts using a 10-fold CV method. The ML classi er was then validated in an independent test cohort. Finally, we use the trained ML model to screen COVID-19 suspects. When the predicted possibility was greater than 0.5, the subject was classi ed as a COVID-19 patient or as a subject with another disease who would be better categorized into the high-risk group, and a further nucleic acid test would be recommend. If the possibility was less than 0.5, then these subjects were treated as low-risk subjects.