1. Patients
This prospective study was performed at a single university center from March 2021 to April 2022 among patients who received an elective tracheostomy. Patients who needed ventilator unit care after tracheostomy or underwent an emergency tracheostomy in the emergency room or intensive care unit (ICU), patients younger than 20 years, and pregnant patients were excluded. Using those criteria, 23 patients with tracheostomy were enrolled in this study. We obtained the following clinical information for all patients. The study was approved by Bucheon St. Mary Hospital of the Catholic University of Korea institutional review board (IRB) (The physiologic changes of trachea according to the degree of sputum after tracheostomy, HC20ONSI0106, approved November 10, 2020). Procedures were followed in accordance with IRB ethical standards and with the Helshinki Declaration of 1975.
2. Recording system
Breathing sound samples were recorded with a voice recorder (Model PCM-A10; Sony, Japan) using a condenser microphone (Model ECM-CS3; Sony, Japan) located two or three cm from the outer opening of the tracheostomy tube in line with the direction of the tube. The recording type was linear pulse code modulation, which can record original sounds without compression.
3. Data collection and classification
All data collection started in the ICU immediately after surgery. Patients were transferred to the general ward two or three days after tracheostomy, and the existing tracheostomy tube was replaced with a new fenestration-type tracheostomy tube three to five days after surgery. We collected data only until the tube was changed. As a result, all patients were recorded for an average of 12 to 16 hours a day for 1 to 5 days after tracheostomy.
Breathing sounds with severe background noise and very low breathing sounds that were not detected in the spectrogram due to some reasons such as sleep were excluded. Breathing sounds during the period when the patients expressed severe dyspnea or medical staff observed airway problems were also excluded. Although patients sometimes noticed the presence of sputum coming out of the tracheostomy tube, there were no other abnormal symptoms such as dyspnea in the finally included data. Two expert otorhinolaryngology doctors with more than 10 years of experience heard all the recording samples and analyzed the waveform of the spectrogram. All the breathing sounds were classified into three categories: normal breathing sound (NS); low-frequency vibrant breathing sound (VS) that indicates a movable obstacle such as sputum in the tracheostomy tube that requires suctioning; high-frequency sharp breathing sound (SS) that indicates a fixed obstacle, including crusts, and blood clots in the tracheostomy tube that requires suctioning or changing the inner cannula of the tube. If the results between expert opinions and the waveform of the spectrogram are different, the result of the spectrogram was selected. If the sound was both vibrant and sharp, it was classified as VS.
All patients also wore parts of a polysomnography device during the study. The oronasal airflow device was located at the opening of the tracheostomy tube and did not interfere with breathing or recording. However, electroencephalogram, electrooculogram, and electromyography sensors were not set up. A photograph of the devices installed on each patient is shown in Fig. 1.
4. Breathing sound data processing methods
In this study, we converted breathing sound samples into a spectrogram and Mel frequency cepstral coefficient (MFCC), which are audio features widely utilized to analyze respiratory status. The details of the converted features are described in the following sections. All data processing for sound classification by multiple AI algorithms, such as spectrogram conversion and MFCC extraction, was performed in MATLAB 2019a.
4 − 1. Spectrogram conversion
An audio spectrogram is a two-dimensional image that simultaneously presents sound waveforms and spectra. By representing continuously changing spectra as a data sample, spectrograms provide rich audio information and are widely used in deep learning frameworks based on image classification 7–9. The breathing sound samples were converted into spectrograms using a short-time Fourier transform. A more detailed process of conversion is presented in Supplementary 1.
Examples of the time-domain wave characteristics and spectrograms of breathing sounds are shown in Fig. 2. In NS, because the airway had no obstacles and there was almost no friction, the acoustic energy was relatively small and mainly below 2,000 Hz. On the other hand, the acoustic energy was scattered over a large area, from 500 to 12,000 Hz, in the abnormal breathing sounds because the obstacles generated sounds of various frequencies. VS was a repetitive pattern at a frequency of less than 100 Hz that occurred when a movable obstacle blocked the trachea or tracheostomy tube. This pattern appears as multiple vertical lines in the spectrogram. In contrast, SS occurred when stiff or fixed obstacles narrowed the cross-section of the airway, which induced a wide range of high-frequency breathing sounds that were more continuous than VS. The pattern of SS appears as multiple horizontal lines in the spectrogram.
4 − 2 MFCC extraction
The MFCC is a group of audio parameters suitable for human auditory characteristics and has been widely applied for speech recognition 10,11 and respiratory diagnosis 12–14. In this study, we used MFCC as a tracheal breathing sound feature for machine learning–based classifiers. A more detailed process of extraction and the designed filter banks are presented in Supplementary 2 and 3.
5. Breathing sound classification methods
5 − 1 MFCC-based Machine learning classification methods
For MFCC-based breathing sound classification, a support vector machine (SVM) 15,16 and k-nearest neighbor (kNN) 17,18, which are widely used for health status diagnosis using MFCC, are employed. And all breathing sound classification for the machine learning algorithms was performed using a desktop machine with an Intel i5-10500F CPU and NVIDIA GeForce RTX1660 Ti (6 GB) GPU. A more detailed process of extraction is presented in Supplementary 4.
5 − 2. Spectrogram-based deep learning method: Convolutional neural network (CNN)
CNN is a widely and successfully established deep learning algorithm in the field of image classification and pattern recognition 19. The basic structure of a CNN is the convolution layer, which calculates the tensor transferred to the next layer through a convolution computation between the tensor and the kernel. A CNN topology includes many convolution layers designed based on factors such as kernel size and number. A CNN provides a framework for learning the features common among images in a data group without requiring manual data extraction, and it can generate accurate pattern recognition or image classification–trained models. Biomedical signal classification studies have been accelerated by imaging and learning with CNNs 7–9. For example, CNN classification based on spectrograms has increased the accuracy of respiratory pattern classification in many clinical fields. In this study, we applied spectrograms converted from one cycle of respiratory data to the following CNN topologies: AlexNet 20, VGGNet 21, ResNet 22, Inception_v3 23, and MobileNet 24. The CNN classification was conducted with Lenovo Intelligent-Computing-Orchestration with a batch size of 32 and a maximum iteration of 200,000.