The proposed system uses the Wavelet transform (WT) to find discriminative features in the X-ray images and a SVM to classify the extracted features. The WT is well-known for its energy compression capability. The proposed system preprocesses the chest X-ray image with WT which produces a set of approximation coefficients that include a limited number of high- magnitude (energy) coefficients. The proposed system presents a novel threshold scanning technique that extracts only selected high-energy approximation coefficients. The small number of extracted coefficients are encoded and used as features representing the input image. These limited features are then applied to a SVM for classification (normal or COVID-19). A block diagram showing the main stages of the proposed system is depicted in Fig.1.
The first operation performed by the proposed system is to apply the WT on the input image, which represents any chest X-ray image in the employed dataset.
3.1 Chest X-ray Image Dataset
The chest X-ray images representing COVID-19 cases were obtained from Cohen [16]. Cohen gathered the COVID-19 images from different public sources. At the time of writing this report, there were 125 chest X-ray images diagnosed with COVID-19 in the database. The images were gathered from 43 females and 82 males whose diagnosis were positive. The images had different formats (png, jpg, and jpeg). A total of 88 positive cases were taken from this database. Fig. 2 (top) shows sample COVID-19 images that came from this database. This database, however, does not contain normal (negative) cases. Fortunately, normal chest X-rays are abundantly available. In this study, the normal chest X-ray images were obtained from the Chest X-ray8 database provided by Wang et al. [17]. Chest X-ray8 comprises 108,948 frontal view X-ray images of 32,717 patients. For this study, only 88 normal (no-findings) images were taken from this database. Fig. 2 (bottom) shows sample images drawn from the chest X-ray database. Hence, our dataset consisted of 88 COVID-19 images and 88 normal images, giving a total of 176 chest X-ray images. Since the number of COVID-19 images was limited, 80% of the images were used for training and the rest was used for testing. Other public COVID-19 chest X-ray images can be found in [18-20].
The original images comprising the employed dataset were of different spatial and intensity resolutions. All images were first converted to 8-bit gray-scale images with spatial resolution of 512 x 512.
3.2 Support Vector Machines
SVMs, originally proposed by Cortes and Vapnik [21], are supervised machine learning algorithms that have been widely implemented in regression and classification applications. A SVM is considered one of the top Artificial Intelligence (AI) algorithms. SVMs can solve linear and non-linear problems. As depicted in Fig. 3, a SVM classifies data by finding the most similar examples (support vectors) among classes and determining the best hyperplane that isolates the data points of the classes. In two-dimensional (2-D) data, the hyperplane becomes a simple line. A SVM tries to find the widest possible margin that separates the two classes with no interior data points.
SVMs were originally designed to be binary or wo-class classifiers. However, SVMs have been altered to tackle data composed of more than two classes [22]. SVMs have shown remarkable success in solving linear and non-linear classification problems [23-24]. A scatter diagram of the dataset decomposed at Level 2, using the Haar wavelet, is shown in Fig.4. The scattered data is a collection of the approximation vectors representing all the images in the employed dataset. The support vectors in Fig. 4 are indicated by showing circles around them.
The SVM algorithm implemented here uses the Gaussian kernel defined by: (see Equation 1 in the Supplementary Files)
where σ is a user-defined variance parameter.
The Gaussian kernel is a general-purpose kernel. It can be used when there is no prior information about the data. Other kernels include the Polynomial kernel, Gaussian radial basis function (RBF), Laplace RBF kernel, Hyperbolic tangent kernel, and Sigmoid kernel. The second stage of the proposed system is to use the WT to obtain discriminative features from the input image.
3.3 Wavelet Transform and Optimum Threshold Level
The WT or Wavelet decomposition, is a mathematical function (transform) that gives another way of representing the input signal or image. The WT is a lossless or energy invariant transform which means that the signal’s energy does not change when it is transformed [25-26].
The Wavelet decomposition tree, depicted in Fig. 5, illustrates the major operations performed by Wavelet decomposition operating on an input signal. The input signal, at the first level of decomposition, produces approximation and detail coefficients. The approximation coefficients represent the low frequency contents of the input signal and the detail coefficients represent the high-frequency contents. At the second level of decomposition, the approximation coefficients produce two sets of approximation and detail coefficients, whose lengths are equal to half of the length of the original approximation vector. The process of decomposition further splits the approximation coefficients into two new vectors for each subsequent level of decomposition [27- 29].
The WT possesses a great energy compaction property since most of the energy of the transformed signal is concentrated in few large coefficients. This property implies that small coefficients can be replaced by zeros without introducing a huge distortion in the reconstructed signal. The energy compactness property of WT has been successfully utilized in image compression schemes, such as the jpeg compression scheme [30]. In data compression methods, only wavelet coefficients which contain most of the signal energy are retained for use in the signal reconstruction. In the proposed system, we exploit this energy compression property of the WT to form a discriminative feature vector representing the input image. The proposed hard-thresholding scheme is given by: (see Equation 2 in the Supplementary Files)
where, 𝐶̂(i), and C(i) are the ith approximation coefficient after and before thresholding, respectively; and T is the threshold level.
Equation 1 indicates that the elimination of small-valued coefficients can be achieved by setting to zeros, all coefficients whose values are less than a certain threshold level. An illustration of the proposed scanning technique, using a threshold value of 3, is shown in Table-2.
Table-2 Proposed thresholding scheme
|
Input vector
|
Output vector
|
-10
|
10
|
-2.8
|
0
|
3
|
3
|
2.3
|
0
|
100
|
100
|
By selecting a nonnegative threshold, the small approximation coefficients can be reset to zeros, resulting in a vector of approximation coefficients consisting of mostly zeros. A thorough description of thresholding methods can be found in [32-33]. The resulting thresholded vector is then encoded by the proposed system, using a modified version of the run-length encoding (RLE) scheme. The RLE scheme is used in MPEG, JPEG, H.263, and H.261 compression schemes [31]. It replaces a string of identical values by codes to indicate the value and the number of times it occurs. To illustrate the modified RLE scheme employed in this study, consider an approximation vector consisting of 60 zeros. It can be replaced by two numbers. The first number is zero, which indicates the string zeros (spaces) and the other number is 60, which indicates the number of zeros. Fig. 6 depicts an illustration of the modified RLE scheme used in this study.
The resulting code vector is used as the feature vector representing the input image. The last step of the proposed system is to present the code vector to a SVM for classification.