In this section, we describe methods utilized for building a robust TMJOA diagnosis model (Fig. 1). These methods include: 1) cross-validation and grid search, 2) feature selection and 3) learning using privileged information.
2.2.2 Feature selection
Feature selection is a common dimensional reduction technique for building a machine learning model. Increasing the number of features often results in decreasing the prediction error. However, it increases the risk of model overfitting particularly with small datasets. Here, we customized a feature selection method that takes the advantages of privileged variables and mutual information to improve the performance of the classifier.
Normalized mutual information feature selection (NMIFS) method and its modified version called called NMIFS + was used to measure the relevance and redundancy of features with the primary objective of high accuracy with the least possible time complexity [30]. NMIFS + extends the NMIFS algorithm with the LUPI framework, which could take full account of the privilege features along with standard features and make feature selection from those two sets separately [31]. The NMIFS + was applied to all the LUPI models in this study and, correspondingly, the NMIFS on non-LUPI models.
2.2.3 LUPI framework
The idea of learning using privileged information (LUPI) was first proposed as capturing the essence of teacher-student-based learning by Vapnik and Vashist [32]. In contrast to the existing machine learning paradigm, where the model learns and makes predictions with fixed information, the LUPI paradigm considers several specific forms of privileged information, just like a teacher who provides additional information, which can include comments, explanations, and logic to students and thus increases the learning efficiency.
In the classical binary classification model, we were given training pairs (x1,y1),...,(xl,yl), where xi ∈ X, yi ∈ {−1,1}, i = 1,...,l, and each pair is independently generated by some underlying distribution PXY, which is unknown. The model is trained to find among a given set of functions f(x,α), α ∈ ∧, the function y = f(x,α) that minimizes the probability of incorrect classifications over the unknown distribution PXY .
In the LUPI framework, we were given training triplets
(x1,x∗1,y1),...,(xl,x∗l,yl), xi ∈ X, x∗i ∈ X∗, yi ∈{−1,1}, i = 1,...,l, which is slightly different from the classical one. Each triplet is independently generated by some underlying distribution PXX∗Y, which is still unknown. The additional privileged information is available only for the training examples, not for the test phase. In this scenario, we can utilize X∗ to improve learning performance.
There are a few implementations of LUPI models. One of them is based on random vector functional link network (RVFL) that is a randomized version of the functional link neural network [33, 34]. A kernel-based RVFL, called KRVFL+, has been proposed based on the LUPI paradigm [35]. It incorporates efficient ways to use kernel tricks for highly complicated nonlinear feature training and train RVFL networks with privileged information (Fig. 2).The parameters, including weights and biases, from the input layer to the hidden layers are generated randomly from a fixed domain, and only the output weights need to be computed.