In this study, we used 31-dimensional features calculated from the measured data by signal processing. Figure 4 shows the schematics of signal processing. First, we used simple moving averages with an interval of 300 points along the separated measured time-domain waveforms to detect envelope waves to obtain low-frequency features. We obtained the mean, standard deviation (SD), minimum (Min), maximum (Max), and gradient calculated from envelope waves, which are between one-third and two-thirds in time. Moreover, the number of peaks of envelope waves between one-eighth and six-eighth in time was calculated for each feature. In addition, we used the Butterworth filter, whose passband edge frequency was 10 Hz and stopband edge frequency was 40 Hz, to measure the time-domain waveforms for obtaining high-frequency features. Z-score normalization was used for this filtered data. Next, a fast Fourier transform (FFT) was used to calculate the frequency spectra. We used librosa library, a Python module for audio and music processing, to obtain a chromagram, zero-crossing rate, spectral centroid, spectral bandwidth, roll-off frequency, and 1–20 coefficients of the Mel-frequency cepstral coefficients (MFCCs) [13, 14].
Candidates for the regressors and classifiers are decision tree (DT, classifier only) [15], random forest (RF) [16], support vector machine (SVM) [17], gradient boosting regression tree (GBRT) and decision tree (GBDT) [18], ensemble algorithm (Adaboost) [19], and two-layer neural network (NN) [20]. To avoid overtraining, we employed a 5-segment cross-validation for regression analysis and Stratified Group K-Fold (K = 5) for classification. In regression analysis, \({R}^{2}\) score, which is the coefficient of determination and mean absolute error (MAE), is used as a criterion for evaluating the performance of supervised learning. In contrast, accuracy and recall were used in the classification. Generally, accuracy, true positive rate (TPR), true negative rate (TNR), false-negative rate (FNR), and false-positive rate (FPR) are used as criteria for evaluating the performance of supervised machine-learning, as defined by the following formulas:
Accuracy = (TP + TN) / (TP + FP + TN + FN) (1)
TPR = Recall = TP / (TP + FN) (2)
TNR = Precision = TN / (FP + TN) (3)
FNR = FN / (TP + FN) (4)
FPR = FP / (FP + TN) (5),
where TP, TN, FP, and FN are the number of correct predictions for positive and negative subjects, and the number of incorrect predictions for positive and negative subjects, respectively. To evaluate these evaluation indexes, we divided the measured data of each subject into training data (90%) for supervised learning and test data (10%) for evaluation.
The training and validations were enforced by Windows 10 Pro personal computer (CPU: Intel Core i9, Memory: 192 GB, SSD: 1TB and GPU: NVIDA GeForce RTX 2080Ti). Python libraries of sciki-learn, Keras, Tensorflow, and XGBoost were used to construct machine-learning models.