An improved residual-network model-based conditional generative adversarial network plantar pressure image classification: a comparison of normal, planus, and talipes equinovarus feet

The number of layers of deep learning (DL) increases, and following the performance of computing nodes improvement, the output accuracy of deep neural networks (DNN) faces a bottleneck problem. The resident network (RN) based DNN model was applied to address these issues recently. This paper improved the RN and developed a rectified linear unit (ReLU) based conditional generative adversarial nets (cGAN) to classify plantar pressure images. A foot scan system collected the plantar pressure images, in which normal (N), planus (PL), and talipes equinovarus feet (TE) data-sets were acquired subsequently. The 9-foot types named N, PL, TE, N-PL, N-TE, PL-N, PL-TE, TE-N, and TE-PL were classified using the proposed DNN models, named resident network-based conditional generative adversarial nets (RNcGAN). It improved the RN structure firstly and the cGAN system hereafter. In the classification of plantar pressure images, the pixel-level state matrix can be direct as an input, which is different from the previous image classification task with image reduction and feature extraction. cGAN can directly output the pixels of the image without any simplification. Finally, the model achieved better results in the evaluation indicators of accuracy (AC), sensitivity (SE), and F1-measurement (F1) by comparing to artificial neural networks (ANN), k-nearest neighbor (kNN), Fast Region-based Convolution Neural Network (Fast R-CNN), visual geometry group (VGG16), scaled-conjugate-gradient convolution neural networks (SCG-CNN), GoogleNet, AlexNet, ResNet-50-177, and Inception-v3. The final prediction of class accuracy is 95.17%. Foot type classification is vital for producing comfortable shoes in the industry.


Introduction
The application of plantar pressure distribution in shoemaking is still under development; the challenges are the intelligent analysis technology unique for plantar pressure data-set. Scholars have tried to obtain foot data from multiple resources. However, these data are still based on geometric and topological information and currently may not play a key role in constructing comfortable shoes [1] [2]. It may reveal the specific changes in the biomechanics of various foot diseases by analyzing plantar pressure data-set; and according to each data-set, tailor-made shoes and insoles for a certain diseased foot, thereby improving the treatment of the foot disease, has become a promised research direction in the biomedical and bioengineering fields [3] [4]. In recent years, plantar pressure gait detection and analysis technology have been developed rapidly, and measurement indicators and accuracy (AC) still have gradually improved; it has been widely used in sports and the field of clinical medicine and rehabilitation medicine [5] [6]. Through the dynamic data-set of plantar pressure, combined with foot biomechanics, sports stability, sports injury, and the connection with clinical medicine and rehabilitation medicine, it is used for assessing sports people the degree of damage risk [7] [8] [9]. Excessive local pressure on the soles or irregular foot movements (such as excessive varus, valgus, etc.) would also be conducted upwards along the lower limbs' chain of motion, which can easily cause chronic injuries to the ankle, knee, hip, and even waist. The gait analysis and test system for plantar pressure provide a scientific treatment method for evaluating and predicting future foot diseases by detecting the human body's lower limbs [10]. It was successfully applied in the detection of plantar pressure distribution in shoemaking through the dynamic collecting of plantar and sole pressure distribution data-set; the shoe's biomechanical properties of cushioning performance, support strength, and stability waiting may be evaluated.
One of the artificial intelligence for dealing with plantar pressure images data-set is an Artificial Neural Network (ANN); ANN has been widely applied in the fields of pattern recognition (PR), intelligent control (IC), and system modeling (SM) due to its distributed information storage, parallel processing, and self-learning capabilities. The plantar pressure system may directly and accurately reflect the foot plantar tissue's changes in elasticity and generate different pressure values in different foot functional areas. These pressure values can be seen as a collection of image pixels with different colors. They may form a time-series pressure curve; these data-sets are transmitted to the computer to create a plantar pressure distribution image. Therefore, image processing technology can be well applied to ANN-based classification. Usually, image classification consists of data pre-processing, feature selection, and extraction. Image classification is a new technology, including image acquisition, image pre-processes feature extraction, and judgment [11]. At present, image classification methods include pixel and feature-based methods. The classification method based on pixels mainly uses the images' basic features to classify the image. Based on color features, because the color distribution of the object surface is different, the image can be classified according to the color feature. The proposed color histogram method is the earliest method to apply color features to image classification. This method uses the proportion of each color in the image space. The differences in the ratio are used to classify the image, but this method cannot distinguish the information described by the image. The texture feature classifies the image through the distribution feature of the image's gray space [12]. The classification based on the shape feature describes the area enclosed by the closed contour curve. The shape feature is generally related to the target object depicted in the image [13].
The classification method based on shape features classifies images by establishing image indexes by contour features and regional features. However, organizing images through the classification method based on image space requires a large amount of calculation data. The calculation process is very complicated, but the classification effect is average. Since the Convolutional Neural Network (CNN) was first applied in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, CNN has been widely used in image recognition and classification. Continuously research has led to the emergence of improved network models and the ability of CNN to learn and extract image features [14]. At the same time, with the emergence of large-scale data sets such as ImageNet and Microsoft Common Objects in Context (MSCOCO), the training intensity of CNN has been continuously improved, making the model have more robust generalization capabilities and improving the application effect in actual image classification problems [15]. For the image classification problem, the network's training requires a large number of labeled data-sets to improve its generalization ability, and the existing data sets are no longer to meet their development needs; this is the main factor restricting the development and promotion of CNN [16] [17]. As the network depth increases, there has a degradation problem, that is, when the network becomes deeper, the training accuracy (TA) would be flat; while the training error becomes larger, which is not caused by overfitting. over-fitting means that the training error of the network will keep getting smaller, but the test error will get bigger [18]. To solve this degradation phenomenon, ResNet was proposed. Instead of using multiple stacked layers to directly fit the desired feature map, ResNet perfectly fits a residual map. Assume that the desired feature map serves the stacked nonlinear layer to form another map [19] [20]. Assuming that optimizing the residual mapping is easier than optimizing the desired mapping; the desired mapping is to be fitted to the identity mapping in extreme cases. At this time, the residual network's task is to fit 0; the ordinary network must be fitted, and the former is easier to be optimized. The difference between the residual network and the ordinary network is the introduction of jump connections; which makes the information of the previous residual block flow into the next residual block without hindrance and improves the flow of information; it and also avoids being too deep with the network finally, which is the problem of vanishing gradient and degradation caused by. The residual network is formed by the fusion of multiple shallow networks [21] [22]. It does not fundamentally solve the disappearing gradient problem but voids the disappearing gradient problem because the fusion of multiple shallow networks forms it. The network does not have the disappearing gradient problem during training so that it can accelerate the network's convergence. The deep residual network has excellent performance and excellent development potential in the application of visual recognition, image generation, natural language processing, speech recognition, and advertising user group forecasting.
Generative adversarial nets (GAN)'s competition method no longer requires a hypothetical data distribution by comparing with other generative models; that is, it uses a distribution to directly sample without formulating () px; furthermore, the biggest advantage of GAN can completely approximate the real data. However, the disadvantage of this method that does not require pre-modeling is that it is too free. For larger pictures and more pixels, the simple GAN-based method is not very controllable. A natural idea is to add some constraints to the GAN, so Conditional Generative Adversarial Nets (cGAN) [23] [24] were developed. This work proposes a GAN with conditional constraints. The conditional variable y is introduced in the modeling of both the Generative model ( G ) and the Discriminant model ( D ); Additional information y is used to add conditions to the model to guide the data generation process. These condition variables y can be based on various information, such as category tags, part of the data used for image restoration, and data from different modalities. If the condition variable y is a category label, the cGAN would be an improvement of turning a purely unsupervised GAN into a supervised model. This direct and straightforward improvement has been proved to be very effective and has been widely used in subsequent related work. The work of cGAN is to extend the original Gan. Both the generator and the discriminator add additional information y as a condition, and y can use any information, such as category information or other modal data. By transferring additional details to the discriminant model and the generating model as part of the input layer, cGAN may be implemented afterward. In the generation model, prior input noise and condition information form a joint hidden layer representation. The adversarial training framework is quite flexible because the hidden layer representations are composed [25] [26].
The rest sections of the paper are organized as Sec. 2 presents the basic models and the improved proposed models; Sec. 3 is the results and discussion, and Sec.4 makes conclusions and future works.

Experiment and Data Collection
The data collection for plantar pressure imaging was completed by the recruited 60 volunteers older than eighteen. Volunteer requirements: typical and no related nervous system diseases; no walking instability or abnormal gait, C-Line interval and blurred vision; no severe joint disease; normal muscle strength and normal tendon reflex; no severe foot pain, no foot ulcer, etc.; socks should be taken off for examination, shoes wearing habits should be inquired, and should fill in relevant forms. When collecting and testing, please walk at normal speed and repeat the test 10 times. The plantar pressure imaging system divides the plantar into ten anatomical regions and measures them. Volume data (discrete values here), including medial and lateral heel, middle foot, five meta bones, thumb, and four other toes is shown in Fig. 1. Figure.1. the experiment on plantar pressure dataset acquisition Foot normal (N), planus (PL), and talipes equinovarus feet (TE) were classified by using plantar pressure image data-set. Planus refers that the foot has lost the normal longitudinal arch, and its sole is flat. When both load-bearing and non-weight-bearing exhibit this characteristic, it is called rigid flat feet. When the arch is missing when standing, but not the arch, it is called a flexible flat foot. This posture causes the talus to slide past the calcaneus' inner side and contact the ground, which is called flat metatarsal foot [27]. This feature can be reflected in the shape of the footprint. The footprint shows that most of the sole is in contact with the ground and the surface area is larger than the average value. The flat appearance of the group makes it easy to identify patients in this posture [28]. The flat metatarsal foot and the navicular bone extend from the top of the medial malleolus to the first metatarsal base. In addition to the flat feet, when viewed from the back, the abduction of the toes and the calcaneus' pronation cause the ankle (foot valgus posture) to tilt inward. When the hind-foot is valgus 4 to 6 degrees, it is a mild flat foot, a moderate flat foot is 6 to 10 degrees, and a severe flat foot is 10 to 15 degrees [29] [30]. In addition to muscle lengthening in flat feet, ligaments and plantar fascia will also be overstretched. Talipes equinovarus feet are mainly caused by the imbalance of the muscle strength of the foot. The long-term imbalance of the muscles causes bone and joint deformities; based on this deformity, weight-bearing causes more severe deformities. Specifically, when walking, the foot is tilted to the outside, supporting the inside of the foot to touch the ground, and the foot is deformed by plantar flexion. It can cause pain on the foot's inside, which affects supporting weight. The body's center of gravity is mainly on the inside of the ankle. Restricted ankle dorsiflexion affects the front and back movement of the anterior tibia and increases valgus. The talar leg joint is painful and has poor stability. There may be knee hyperextension in the early support phase, lack of strength in pedaling, and limb clearance obstacles in the swing phase [31]. In this deformity, the affected foot rotates inward at the ankle-the foot points down, facing inward and the sole. 50% of clubfoot patients have bilateral clubfoot (both feet are affected), the tendon on the inner leg is shortened, the bone has an unusual shape, and the Achilles tendon is strained [32].

Methods 2.2.1 Forward and Backward Neural Networks
ResNet-based CGAN is a fundamental neural network introduced here into two distinct stages as forward and backward neural networks. Forward neural networks including fully connected feedforward neural networks and convolutional neural networks. The forward neural network can be regarded as a function, through multiple combinations of simple non-linear functions, to achieve complex mapping from input space to output space. backward neural networks are a standard method used in combination with optimization methods (such as gradient descent) to train ANN. This method calculates the gradient of the loss function of all weights in the network. The gradient is fed back to the optimization method to update the weight to minimize the loss function. Back-propagation requires a known output for each input value to calculate the gradient of the loss function. Therefore, although it is also used in some unsupervised networks (such as automatic encoders), it is usually regarded as a supervised learning method. It is an extension of the incremental rule of multilayer feedforward networks, and the chain rule is used to calculate the gradient of each layer iteratively. Backpropagation requires distinguishing the activation function of artificial neurons (or "nodes"). The input excitation and response errors are multiplied to obtain the weighted gradient; the gradient is multiplied by a ratio, and then the reciprocal is added to the weight. This ratio will affect the training process's speed and effect, so it becomes a "training factor". The gradient direction represents the direction of error expansion, so it needs to be reversed to reduce the error caused by the weight when updating the weight (shown in Fig. 2).

Deep Residual Network
The current progress of Deep Learning (DL) depends on skills including initial weight selection, local receptive fields, weight sharing, etc., when using deeper networks (such as> 100), there still has to face the traditional difficulties of disappearing gradients during backpropagation: degradation problem [33]. The more layers, the higher the training error-rate and the test error rate. The introduction of "shortcuts" can prevent the problem of gradient disappearance. Some researchers had studied this aspect before ResNet concluded that deeper networks should also be easy to optimize. The more layers there are, the higher the accuracy. The training method will not compare with "traditional" deep networks. The first input X is superimposed on the weights according to the CNN and then passed through the activation function. After the secondary weights are superimposed, the input signal and the output are superimposed and then passed through the activation function. Such a network and that line are called a shortcut. The residual in linear fitting refers to the difference between the data point and the fitted line's function value [34] [35]. Then an analogy here can be made, where X is a fitting function, and () Hx is the specific data point. Then, add the value of the fit through training to () Fx get the specific data point; () Fx is the residual [36]. The basic residual network called ReLu is in Fig. 3.
where, xX  Continuously, we have that, where, Following the backpropagation principal and let error be  , partial differential on l X , we have that, if, regarding as (5), we have that

Updated conditional Generative Adversarial Nets
The objective function of conditional GAN is a two-side mini-max game with conditional probability, and we have that, The basic structure of cGAN is illustrated in Fig. 4.

Updated Structures of CGAN using Resident Networks
The GAN is a framework for training production models. The original GAN can train an unconditional production model, which does not control the generated data model. Adding conditional constraints to the original GAN model makes it possible to guide the process of data generation. Such a GAN network is called a conditional generative confrontation network. Among them, the added condition can be a category label or other modal data. The following figure shows the basic structure of the conditional generation confrontation network. The key to the model is to add conditions as inputs to the generator and the discriminator. A simple application example is based on the number category label as a condition, and the production model is trained to generate a specific number based on the given label. cGAN can be applied to cross-model problems, such as automatic image annotation. And the resident network-based conditional generative adversarial network (RNcGAN) is proposed in Fig. 5 to Fig.8.  Generator and discrimination function with image sequencing In this research, we improved the ReLu-based cGAN model for the plantar pressure image dataset. Algorithm 1 introduced a basic GAN model while Algorithm 2 and Algorithm 3 illustrated the proposed improved cGAN models. The improved structure for plantar images dataset classification is that, by setting the input layers to 2 and max pool layers to 2, 4, 4, 4 ; and the steps are described as follows, Step 1: Create a Layer Graph object. The hierarchy diagram specifies the network architecture and connects these layers in turn.
Step 2: Add layers to the hierarchy or remove layers from the hierarchy.
Step 3: Connect layers to establish layer connections between different layers or disconnect layers to disconnect.
Step 4: Describe the network architecture.
Step 5: Train the network using a directed acyclic graph (DAG) network model.

Results
Firstly, the operator starts the system and imports the relevant setting parameters. The subjects were required to walk into the force plate and repeat ten times; the system software obtains plantar pressure data in real-time, and the system automatically stores the data. The operator selects the dynamic mode to get the plantar pressure change process in real-time during the test (shown in Fig. 9). In which HL is heel lateral; HM is heel medial; MF is midfoot; M1 to M5 are the first to the fifth metatarsal; T1 is hallux big toe; T2 to T5 is the 2nd to 5th toes. The experimental device outputs discrete plantar pressure in each zone of the foot in Table 1. The subjects' walking process is divided into four stages and five intervals. In the selection of the boarding method, "take the steps on the way" is acquired. In the selection of test times, "take the steps in the way" collects six times of data. "one-step boarding" requires at least eight data collections; "two-step boarding" requires at least five data collections. To obtain pressure peak and pressure time integral value, six data collections are necessary. "double open", "double closed", "single open", and "single Closed" during static balance tests were assigned. In the experimental stage, standing on two feet is generally twenty seconds and standing on one foot is generally ten seconds. The image data-set was acquired and listed in Fig. 10 (group N) and Fig. 11 (group PL and group TE).  (1) left and right foot in planus (PL) group (2) left and right foot in talipes equinovarus feet (TE) group Fig. 11. Typical images from groups of planus (PL) and talipes equinovarus feet (TE) The image data-set was generated by a 2-second collection with a scanning system, which has 12 bits image resolution, 16 analog channels, 0.5*0.7 cm 2 size of each sensor with 4/cm 2 , and 125-300Hz sampling Hz.so there is plenty of images for the resident networks based CGAN training and prediction of classifications.
Set "NetWidth" to the width of the network, which is defined as the number of filters in 3 × 3 convolution layers of the network. "NumUnits" is the number of convolution units in the main branches of the network. The number of convolution units in each stage is the same so that "NumUnits" must be an integral multiple of 3. "UnitType" is the type of convolution unit, specified as standard or bottleneck. The standard convolution unit consists of two 3 × 3 convolution layers. A bottleneck convolution unit consists of three convolutional layers: a 1×1 layer that is down-sampled in the channel dimension, a 3×3 convolutional layer, and a 1×1 layer up-sampled in the channel dimension. Therefore, the number of convolutional layers of the bottleneck convolution unit is 50% more than that of the standard unit. In contrast, the number of convolutional layers of the 3×3 space is half of the standard unit. The computational complexity of these two unit types is similar, but when using bottleneck units, the total number of features propagated in the residual connection is four times. The network's total depth is defined as the sum of the number of sequential convolutional layers and fully connected layers. For a network consisting of standard units, the total depth is 2*numUnits + 2, and for a network composed of bottleneck units, the total depth is 3*numUnits + 2. A class set is [N, PL, TE, N-PL, N-TE, PL-N, PL-TE, TE-N, TE-PL], which N-PL is between normal and planus. So, the classification outputs 8 classes other than N, PL, and TE. We used an embedding dimension of 50 and three 5-by-5 filters corresponding to the three RGB channels of the generated plantar pressure images; a dropout probability of 0.80; the number of epochs is 500; the minimum batch size is 128; the number of validation images for each class is 8. The training process under 1400+ iterations is shown in Fig. 12. By comparing with the proposed RNcGAN model for classifying plantar pressure image dataset, several Deep Neural Networks (DNN) were constructed and implanted in MATLAB 2020b. A 21-inputs-3-output classifier with a 20-hidden-3-output layer of the neural network was constructed for the hereafter comparing. The structure of the neural network is in Fig.14. The network's performance for classifying the plantar pressure image dataset is shown in Fig.15. The most applied in the deep neural network is GoogleNet. We constructed a novel framework of the GoogleNet for plantar pressure using the same size and population of the input data set. Furthermore, ReLU, basic GAN (bGAN), pre-trained CNN also were constructed subsequently. Fig.16 Fig. 16. The accuracy of prediction and true class for typical networks using a plantar pressure image dataset The typical indices of image classification were compared in different classifiers. The typical classification indices are Accuracy (AC), Precision (P), Recall (Re), F-measurement (F1), Receiver Operating Characteristic (ROC) curve, Area Under the Curve (AUC), and Precision-Recall (P-R) curve; for a two-class classification problem, which divides instances into positive or negative classes, the following four situations will occur in actual classification. If an instance is an affirmative class and is expected to be an affirmative class, then it is a real class-True Positive(TP); if an instance is an affirmative class but is expected to be a negative class, then it is a False Negative (FN); if an instance is a negative class but is expected to be an affirmative class, then it is a False Positive (FP); if an instance is a negative class and is predicted to be a negative class, then it is a True Negative (TN 1 a  , F-Measurement is F1. Some ROC and AUC-related indices were compared. The true positive rate (TPR), also known as sensitivity (SE), describes the proportion of all positive instances that are correctly classified by the classifier; it is the same with recall; true negative rate (TNR), also known as specificity (SP), describes the proportion of negative instances correctly classified by the classifier in all negative instances, that is, In this research, AC, SE, SP, and F1 are calculated for the comparing analysis. ANN [37], k-nearest neighbor (kNN) [38], Fast region-based convolutional neural network (Fast R-CNN) [39], Visual Geometry Group -16 (VGG16) [40], Scaled Conjugate Gradient CNN (SGC-CNN) [41], GoogleNet [42], AlexNet [43], ResNet-50-177 [44], and Inception-v3 [45] were selected for comparing analysis finally. The proposed model's performance is listed in Table 2; and the results show that the dominance of the proposed improved resident network-based cGAN model (RNcGAN) in indices of A, SE, SP, and F1. The comparing indices showed that the proposed RNcGAN performs high effectiveness in A, SE, and F1.

Conclusions
Some foot characteristic factors of the human body affect the shoes' fit. By analyzing the characteristic distribution data of the sole, the mechanical bearing characteristics of the bottom surface can be obtained, and the last surface optimization law can be further obtained. The detection of foot pressure and analysis can be directly utilized in different pressure values in different areas and reflect the elastic changes of the plantar tissues in different states. The pressure values of different areas of the plantar form a time-series pressure curve value, and these data can be processed into a plantar pressure distribution image, and the image shows the degree of foot inversion and valgus, and the overall motion line of the foot during walking. Excessive foot varus or valgus may cause a certain degree of a foot injury. Finally, PL and TE foot shapes are formed. Deep learning can better classify these images, obtain a large number of foot shapes and help design a comfortable shoe. In this paper, better results have been obtained by improving RN and cGAN. This network model can be used in a wide range of cases of image pixel-level direct classification. The trained network is more sensitive to classification, and the model has better generalization capabilities. The accuracy of classification is over 90%. The future work is how to increase the collection scale of the data set, improve the network structure to expect better classification results, and further enable the model to have the ability to transfer learning.