High accuracy classi�cation of COVID-19 coughs using Mel-frequency cepstral coe�cients and a Convolutional Neural Network with a use case for smart home devices

Diagnosing COVID-19 early in domestic settings is possible through smart home devices that can classify audio input of coughs, and determine whether they are COVID-19. Research is currently sparse in this area and data is di�cult to obtain. However, a few small data collection projects have enabled audio classi�cation research into the application of different machine learning classi�cation algorithms, including Logistic Regression (LR), Support Vector Machines (SVM), and Convolution Neural Networks (CNN). We show here that a CNN using audio converted to Mel-frequency cepstral coe�cient spectrogram images as input can achieve high accuracy results; with classi�cation of validation data scoring an accuracy of 97.5% correct classi�cation of covid and not covid labelled audio. The work here provides a proof of concept that high accuracy can be achieved with a small dataset, which can have a signi�cant impact in this area. The results are highly encouraging and provide further opportunities for research by the academic community on this important topic.


Introduction
The 2019 novel coronavirus, COVID-19, which became a pandemic in 2020, has been the largest global public health emergency in living memory 1 . The virus is a strain of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1] which afflicts the respiratory system and therefore causes symptoms such as coughing and breathing difficulties, fever, fatigue, as well as ageusia and anosmia [11]. Many research efforts are underway into various aspects of COVID-19 including vaccine research [12], anti-viral treatments like Remdesivir [5], large scale literature data mining like the COVID-19 Open Research Dataset Challenge (CORD-19) 2 , and diagnostic tools for determining who has the virus at early stages [11]. In this study we investigate the feasibility of high accuracy audio classi cation of COVID-19 coughs as a potential diagnostic software application that would be available in-home or workplace through smart devices such as Amazon's Alexa, or Google Home. 1 [3]. They position their research as one in a long history of using bodily noises to diagnose ailments, from the logical conclusion that physiological changes can alter the natural sounds that human bodies produce [14]. They classify coughing and breathing audio using Logistic Regression (LR), Gradient Boosting Trees with Support Vector Machines (SVMs), achieving a best accuracy (AUC) of 82%. The data distribution of participants in this paper also shows that the dataset is skewed towards middle aged people, likely due to older participants (those more vulnerable to COVID-19) being less likely to engage with mobile phone crowdsourcing technology. However, the results show no difference between the age groups in classi cation.
Imran et al have also developed a mobile app which can classify a COVID-19 cough from a 2 second audio recording with accuracy in the region of 90%, with some data permutations producing accuracy of 96.76% [7]. The research used Mel-frequency cepstral coecients (MFCC), a type of spectrogram image, and a Convolutional Neural Network (CNN) for classi cation, which demonstrated better accuracy than the LR, and SVM, used by Brown et al.
Lastly, Sharma et al created Coswara, a database of coughing, breathing, and voice sounds (vowel sounds and counting) for COVID-19 diagnosis research. The data is collected through a web application which is used to collect, label, and quality control the dataset [10]. The data was then classi ed using a Random Forest (RF), achieving a mean accuracy of 70% for coughing.

Rationale
Research work is currently ongoing at the University of Manchester into audio classi cation in smart environments, towards a larger programme of work into human behaviour prediction. It seemed likely that the classi er developed as part of that work could easily be transferred with little modi cation to classify COVID-19 coughs. We were able to demonstrate a proof of concept that makes it possible for this diagnostic technology to be used at scale in smart home devices for early diagnosis of the virus before patients seek out clinical treatment, particularly in those cases where they might not seek help until the symptoms have signi cantly progressed.

Contribution to research
A demonstration of high accuracy classi cation of COVID-19 coughs using an MFCC CNN machine learning architecture.

Use case
In developing our classi er we proposed a use case scenario of a smart home device user (which could easily apply to a workplace or other location) such as an Alexa or Home device. The device passively monitors for cough sounds, and upon a positive classi cation for a COVID-19 cough, prompts the user to seek a professional medical diagnosis, or even calls the relevant local services for the user. This use case is shown in Figure 1.

Methods
In development of our classi er we chose to employ an MFCC CNN architecture which has proven, high accuracy results, for audio classi cation [2][6][9] [13]. The datasets available contained several different types of audio including coughs, breathing, and speech (vowel sounds). For initial investigation we chose to focus on one of these sounds, which we hypothesised would boost the CNN performance by training against binary class options rather than attempting a multi-class CNN with different types of audio which could negatively bias our results.

Dataset
Our classi er is trained on data from three datasets. We used Google's AudioSet [4] for the baseline not covid labelled coughs. This audio is taken from multiple YouTube video clips, and the cough class has been given a 100% quality rating 3 by human veri cation. The COVID-19 cough audio data is from two other sources, the Corswara project [10] which is available on Github 4 and is gathered through the Corswara web application 5 which handles data collection and quality control. This is combined with data from the Stanford University led Virufy mobile app 6 which collects data and also makes it available on Github 7 .
The Virufy and Corswara datasets were combined due to the low number of available COVID-19 cough audio samples; with a combined number of 17 audio samples of COVID-19 coughs in total. One of these audio samples was dropped from the dataset due to issues with it being read by our code, leaving a total of 16 COVID-19 audio samples for training and validation.
The data was split to create a balanced train/test dataset of 28 audio samples; 14 each for the covid and non covid labels. Along with a validation set of 40 audio samples, two of which are covid labels, with the remaining 38 non covid labels.
The audio data is loaded and processed into MFCCs, as shown in Figure 2, using the Librosa Python library [8]. The MFCC in the form of a 120 × 431 × 1 tensor along with the corresponding label is passed to the model for training. Our audio data and labels have been made available online for further research . 3 AudioSet cough class quality rating, http://archive.is/MZMRJ 4 Figure 2, we can use an image classi er -already a highly developed area of machine learning -for classi cation of the audio as an image. We used a deep convolutional neural network, shown in Figure 7, with multiple hidden layers and a binary dense output layer for label classi cation.
Our model was trained using the combined balanced dataset of 28 audio samples, 14 of each label, on the University of Manchester's Research IT Computational Shared Facility (CSF) computing cluster using a NVIDIA Tesla V100 GPU. Once trained the model was passed the validation dataset of 40 audio samples (2 with the covid label) via a testing API. The results data was then logged to an SQL database.
The test framework is shown in Figure 6.

Results
Training for the CNN model yielded a 100% accuracy with a small dataset, shown in the confusion matrix in Figure 3 and the plots in Figures 4 and 5. Testing with the validation dataset, which is data the model has not seen before, yielded an accuracy of 97.5%, with a single false positive result for a not covid label.
Both of the covid labels in the validation dataset were correctly classi ed. The validation dataset results can be seen in Table 2, and a comparison of our results against similar research work can be seen in Table 1.

Discussion
Our CNN classi er proof of concept demonstrates that high accuracy audio classi cation of COVID-19 is possible, and could be used as a software application in a multitude of ubiquitous smart devices such as mobile phones and smart speakers. Deploying this technology to existing devices that passively monitor for trigger sounds could rapidly improve COVID-19 early detection rates in technologically developed countries.
Our results, and also that of Imran et al, con rm that an MFCC CNN approach produces superior classi cation results compared to other types of classi ers such as Logistic Regression, Support Vector Machine, and Random Forest.

Conclusion
Our classi er demonstrated a high accuracy of 97.5% compared to the other studies, marginally outperforming the Imran et al CNN model which achieved 96.76% accuracy, however we were able to train our model using a much smaller dataset. We also showed that an MFCC CNN architecture signi cantly outperforms the Brown et al classi er algorithms of Logistic Regression, and Gradient Boosting Trees with Support Vector Machines.

Further work
There are many more opportunities for further research in this area, particularly training a classi er with a larger COVID-19 dataset, perhaps from the Brown et al project at the University of Cambridge who are working on open sourcing their crowd sourced data. Or from one of the ongoing clinical studies such as the NIH Audio Data Collection for Identi cation and Classi cation of Coughing 9 which includes COVID-19 data collection. These projects may be able to provide a larger training dataset that has been through a much more rigorous collection process.   Figure 1 Use case diagram MFCC for XU dxNFXH7U 10.000 20.000.wav audio