ExpDNN: Explainable Deep Neural Network

In recent years, deep neural networks have been applied to obtain high performance of prediction, classification, and pattern recognition. However, the weights in these deep neural networks are difficult to be explained. Although a linear regression method can provide explainable results, the method is not suitable in the case of input interaction. Therefore, an explainable deep neural network (ExpDNN) with explainable layers is proposed to obtain explainable results in the case of input interaction. Three cases were given to evaluate the proposed ExpDNN, and the results showed that the absolute value of weight in an explainable layer can be used to explain the weight of corresponding input for feature extraction.


Fig. 1. The structure of ExpDNN
In the proposed ExpDNN, the neuron in the i-th explainable layer ( i  ) with a linear activation function can be measured by Equation (1). Furthermore, the function   k   denotes the activation function in the k-th hidden layer, and the number of neurons in the (k-1)-th hidden layer is 1 k n  . The weight of the connection between the i-th neuron in the (k-1)-th hidden layer and the j-th neuron in the k-th hidden layer is expressed as   , k ij w , and the bias of the j-th neuron in the k-th hidden layer is expressed as   k j b . Therefore, the j-th neuron in the first hidden layer (   1 j a ) with a linear activation function can be measured by Equation (2). The j-th neuron in the k-th hidden layer (   k j a ) can be measured by Equation (3), and nonlinear functions could be adopted as activation functions in the k-th hidden layer.
For obtaining estimated outputs, the weight of the connection between the i-th neuron in the l-th hidden layer and the j-th neuron in the output layer is expressed as   cross-entropy. Therefore, the j-th output in the output layer ( j y ) can be measured by Equation (4). The Nesterov-accelerated adaptive moment estimation (Nadam) [2] is adopted as an optimizer in this study.
(4) Case 1: Case 1 presents a simple application, and Table 1 shows the data in Case 1 [3]. In Case 1, the candidate inputs include g1, g2, g3, g4, and g5; the output is h. The high correlations exist among in g1, g5, and h in Case 1. Three subcases which were designed to find the important inputs for the representation of the proposed ExpDNN include: Case 1(1) considered to adopt g1 and g2 as inputs; Case 1(2) considered to adopt g1, g2, g3, and g4 as inputs; Case 1(3) considered to adopt g1, g5, g3, and g4 as inputs. The number of epochs is 60,000, and the loss function of mean squared error is adopted as the loss function of ExpDNN in each subcase. In Case 1(1), Table 2 shows the structure of neural network, and the parameters g1 and g2 denoted the inputs of ExpDNN (i.e. x1 and x2). In the trained ExpDNN, the values of w1 and w2 were 1.2318 and 0.5673, respectively. Therefore, the results showed the important level of g1 was higher than the important level of g2. The high correlation between g1 and h could be extracted by the proposed ExpDNN. Furthermore, Table  3 shows the structure of neural network in Case 1(2) and Case 1(3). In Case 1(2), the parameters g1, g2, g3, and g4 denoted the inputs of ExpDNN (i. e. x1, x2, x3, and x4). In the trained ExpDNN, the values of w1, w2, w3, and w4 were 1.3499, 0.0544, 0.0520, and 0.0515, respectively. Therefore, the results showed the important level of g1 was higher than others, and the high correlation between g1 and h could be extracted by the proposed ExpDNN. In Case 1(3), the parameters g1, g5, g3, and g4 denoted the inputs of ExpDNN (i. e. x1, x2, x3, and x4). In the trained ExpDNN, the values of w1, w2, w3, and w4 were 1.0047, 1.3884, -0.6093, and -0.6140, respectively. Therefore, the results showed the important level list of parameters sorted by the absolute values of weights was g5, g1, g4, and g3. Therefore, the high correlation among in g1, g5, and h could be extracted by the proposed ExpDNN.  Case 2: Case 2 presents an application (i.e. an exclusive-OR gate) with input interaction, and Table 4 shows the data in Case 2 [3]. In Case 2, the candidate inputs include q1, q2, q3, and q4; the output is r. The input interaction exists between in q1 and q2 for estimating the value of r in Case 2. Two subcases which were designed to find the important inputs with input interaction include: Case 2(1) considered to adopt q1 and q2 as inputs; Case 2(2) considered to adopt q1, q2, q3, and q4 as inputs. The number of epochs is 60,000, and the loss function of binary crossentropy is adopted as the loss function of ExpDNN in each subcase.
In Case 2(1), Table 5 shows the structure of neural network, and the parameters q1 and q2 denoted the inputs of ExpDNN (i. e. x1 and x2). In the trained ExpDNN, the values of w1 and w2 were 2.0086 and 2.0086, respectively. Therefore, the results showed the important level of q1 was equal to the important level of q2; both q1 and q2 and were important parameters for estimating the value of r. In Case 2(2), Table 6 shows the structure of neural network, and the parameters q1, q2, q3, and q4 denoted the inputs of ExpDNN (i. e. x1, x2, x3, and x4). In the trained ExpDNN, the values of w1, w2, w3, and w4 were 1.9830, 1.9830, 1.0000, and 1.7162, respectively. Therefore, the results showed the important levels of q1 and q2 were higher than others, and the parameter q1 and q2 with input interaction for estimating the value of r could be extracted by the proposed ExpDNN.
Case 3: Case 3 presents a practical application of Anderson's Iris data set. Table 7 shows the structure of neural network, and the parameters of sepal length, sepal width, petal length, and petal width denoted the inputs of ExpDNN (i. e. x1, x2, x3, and x4), and the classes of setosa, versicolor, and virginica denoted the outputs of ExpDNN (i.e. y1, y2, and y3). The number of epochs is 60,000, and the loss function of categorical cross-entropy is adopted as the loss function of ExpDNN. In the trained ExpDNN, the values of w1, w2, w3, and w4 were 0.8870, 0.8834, 1.8384, and 1.8925, respectively. Therefore, the results showed the important level list of parameters sorted by the values of weights was petal width, petal length, sepal length, and sepal width. The important features of petal width and petal length for classifying the species of Iris could be extracted by the proposed ExpDNN.