An Automatic Evaluation Method for Parkinson's Dyskinesia Using Finger Tapping Video for Small Samples

The assessment of dyskinesia in Parkinson's disease (PD) based on Artificial Intelligence technology is a significant and challenging task. Based on the representative action of finger tapping, this paper designs an automatic evaluation method by deep learning technology to provide big data screening and preliminary evaluation for potential patients with PD. Presently, doctors use the Movement Disorder Society-Sponsored Revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS) to assess the severity of patients. This method is time-consuming and laborious, and there are subjective differences. The evaluation method based on sensor equipment is also widely used, but it is expensive and needs professional guidance, which is not suitable for remote evaluation and patient self-examination. Additionally, it is difficult to collect patient data for medical research, so it is important to find an objective and automatic assessment method for Parkinson's dyskinesia based on small samples. In this study, we designed an automatic evaluation method combining manual features and convolutional neural network (CNN) suitable for small sample classification. Based on the finger tapping video of Parkinson's patients, we used the pose estimation model to obtain the action skeleton information and calculated the feature data. We then use the 5-folds cross validation training model to achieve an optimum trade-off between bias and variance, and finally make a multi-class prediction through fully connected network (FCN). Our proposed method achieves the current optimal accuracy of 79.7% in this research. We have compared with the latest methods of related research, and our method is superior to them in terms of accuracy, number of parameters and floating point operations (FLOPs). The proposed method does not require patients to wear sensor devices and has clear advantages in providing early screening and remote clinical evaluation for patients with PD. Simultaneously, the proposed method using motion feature data to train the CNN model obtains the optimal accuracy, effectively solves the problem of limited data acquisition in medicine and provides a novice proposal for a small sample size classification.


Introduction
Parkinson's disease is the second most common neurodegenerative disease followed by Alzheimer's disease [1]. These patients exhibit static tremors, muscle stiffness, bradykinesia and postural instability [2,3]. Accurate and objective evaluation results of Parkinson's disease should be obtained for the treatment of Parkinson's disease. At present, there are many ways to evaluate the motor function of patients with PD, among which the MDS-UPDRS scale [4], as a standard rating scale for PD evaluation, is widely adopted in the evaluation of PD motor level, because of its simplicity and comprehensiveness, it is widely used in evaluating the motor level of patients with PD [5]. However, the accuracy of scale-based evaluation directly depends on doctors' clinical experience and has subjective differences; therefore, it is of great significance to provide an objective and automatic evaluation method for early screening of suspicious 1 3 early-stage patients and assist doctors' clinical evaluation and the self-examination of patients.
Thanks to clinical needs and the rapid development of deep learning, CNN is widely used in the evaluation and diagnosis of PD. The research is focused on the assessment of motor disorders, pathological analysis and early diagnosis of PD [6][7][8]. Bradykinesia is the core symptom of Parkinson's disease, and the finger tapping test is often used to evaluate the Bradykinesia in PD [9,10]. Gao et al. [9] demonstrated that finger tapping can be used to evaluate the different severity of bradykinesia in PD, and can differentiate early stage PD from normality. Our proposal focuses on the related research on bradykinesia in PD. At present, the primary research method based on deep learning is to obtain characteristic data through sensor devices for monitoring, analysis and evaluation [11][12][13]. For example, the system based on body network sensor proposed by Parisi et al. [14] automatically evaluates the severity of PD patients' gait by extracting kinematic features in time and frequency domains to characterize the Parkinson's disease gait. The detection system based on mechanical impedance proposed by Dai et al. [15] quantitatively evaluates myotonia in patients with PD by extracting body movement information. The research methods based on sensors or wearables, which can obtain deeper and complex information [16], have great potential in the quantitative evaluation of PD and the development of treatment equipment, however, it depends on the guidance of professionals, the equipment restricts the movement of patients to a certain extent, with limited application scenario, so it does not apply to the regular evaluation of patients with PD.
We propose an action recognition model that evaluates PD without any professional equipment. CNN-based action recognition models have demonstrated optimal performance, particularly with a skeleton dataset [17,18], with several action recognition tasks achieving prime results. For example, the method proposed by Li et al. [19] achieves 92.08% accuracy on the skeleton-based ChaLearn gesture dataset. Chen et al. [20] propose a channel-wise topology refinement graph convolution network, which has achieved outstanding action recognition performance on the NTU RGB + D120 dataset, the accuracy of cross-subject and cross-view is 88.9% and 90.6% respectively. Despite the prime performance demonstrated using the above methods, they are focused on coarse-grained tasks, most of which are used to identify activity scenes with prominent action differences, and have difficulty dealing with complex tasks [21,22]. Additionally, the model architecture of action recognition is complex, and model training depends on many datasets. However, it is difficult to obtain datasets in medical research, especially clinical data based on patients' behavior. Because of the limitations in the number of patients and the protection of their privacy, the volume of data collected is frequently inconsequential in the training of complex models. Thus, it is necessary to find a method based on a small sample size to evaluate PD.
A finger tapping test evaluates motor dysfunction in patients [23] and neurophysiological examinations [24] since the motor characteristics of finger tapping are closely linked to bradykinesia [25][26][27]. For example, Tavares et al. [25] prove a correlation between finger tapping and dyskinesia in patients with PD, and reveal an improvement in the management of dyskinesia in patients with PD by medication and deep brain stimulation. Liu et al. [26] use the finger tapping evaluation system based on sensor technology to refer to the clinical evaluation of hand movement function in patients with PD. The MDS-UPDRS provides a grading standard for finger tapping test, as a reference for doctors to evaluate PD, which also provides a theoretical basis for this paper. According to Goetz et al. [28], the accurate scoring and interpretation of the finger tapping test results requires a wealth of experience, becoming one of the most difficult items to evaluate in a PD motor examination, even for experienced doctors. Subtle differences in motor are difficult to detect. By observing the evaluation criteria of finger tapping, the difference in finger tapping test results between adjacent scores (for example, scored 1 and 2) is fuzzy. This high-fine-grained recognition task poses a severe challenge to the classification model based on CNN. Our research is based on the high fine-grained task of finger tapping test.
In view of the aforementioned problems, based on the Parkinson's video data set, this paper designs a method by combining manual features with CNN to realize the evaluation and classification of Parkinson's finger tapping test. This paper mainly includes the following: (i) Sensors and other auxiliary equipment are inconvenience, need special guidance, and cannot continuously and promptly record the disease changes in patients with PD. The proposed evaluation method extracts the feature data by analyzing the video dataset and then evaluates and diagnoses it with CNN. The dataset is photographed on a smartphone and does not depend on a sensor. (ii) In view of the minor differences in the finger tapping test, this paper summarizes the change law of the finger tapping test and designs a feature based on the range and velocity of the action. First, a pose estimation algorithm is used to extract the hand skeleton data, then calculates the feature data based on the hand skeleton data, and finally evaluates and grades the action through CNN, which provides a novice method for PD evaluation. (iii) The combination of manual features and CNN proposed in this paper effectively solves the problem of difficult data acquisition in medical research. Com-pared with the dependence of previous action recognition models many data sets, this achieves optimal performance with small sample sizes.
The other parts of this paper are arranged as follows: In the second part, the method that combines manual features and CNN proposed in this paper is introduced in detail; In the third part, the experiments are presented, including dataset preparation, experimental results of the methods, comparison experiment with the existing methods and an analysis experiment for manual features; In the fourth part, the methods and experimental results are discussed and analyzed; Finally, the fifth part concludes this paper.

Subjects
Ninty-three patients with PD were recruited from the outpatient section of the Movement Disorder Clinic of Zhejiang Hospital with 27 age and gender matched normal controls (NC) participated in this study. The diagnosis of PD was confirmed by at least two movement disorder specialists according to the 2015 MDS clinical diagnostic criteria [29]. Exclusion criteria included patients with moderate-to-severe dementia with the mini-mental state examination (MMSE) < 24 [30]; patients with incapacity to give informed consent; any patient with atypical Parkinson's syndromes such as vascular Parkinsonism, drug-induced Parkinsonism, progressive supranuclear palsy (PSP), and multiple system atrophy (MSA). All the patients with PD were treated with anti-parkinsonism medication and at "off" medication status. This study was approved by the Ethics Committee of Zhejiang Hospital. All the participants signed informed consent.

Methods
The flow chart of the finger tapping evaluation method proposed in this paper is illustrated in Fig. 1. First, the finger tapping test video of patients with PD is collected, and then the skeleton data of the hands are extracted by the pose estimation model MediaPipe Hands [31]. Subsequently, feature data based on the motion law of the hands is extracted based on the method designed in this paper, to obtain the one-dimensional time series data. Following the data pre-processing, that is normalization and cropping alignment, the data are input into the FCN, and finally the output of five-classification results is obtained, which is the MDS-UPDRS score prediction of the corresponding finger tapping.

Pose estimation
The MediaPipe algorithm is used to extract the hand skeleton data from the video of finger tapping patients with PD. MediaPipe Hands is one of the most advanced frameworks for hand skeleton estimation, which is robust to partially visible and occluded hand estimation. It detects the skeleton of the hands by two models working together: (1) Palm Detection Model, which searches the entire image and returns the predicted hand boundary box; (2) Hand Landmark Model, which detects the area of the hands returned by the palm detector to return the high-fidelity key points of the hands. The key hand points returned by the model contain the 3D coordinates of 21 knuckles, and the position and name of the corresponding knuckles as illustrated in Fig. 2.
We used the MediaPipe Hands method to detect the video data of patients with PD, and the effect is represented in Fig. 3, from which we get the hand skeleton data of the finger tapping of all patients. Each video frame corresponds to a set of three-dimensional data of joints. We used the 3D coordinate J i = {x i , y i , z i } to represent the joint i. Assuming there are T-frame pictures in each video, and each hand skeleton includes u joint points. In this proposal, u = 21 means the hand feature M t of frame t can be expressed as

Manual Feature Extraction
The feature extraction method is designed based on the action evaluation standard of MDS-UPDRS. Based on the analysis and evaluation of the index finger tapping, we find that the difference in scores is reflected in the opening range, tapping velocity and interruption behavior of fingers, therefore we designed a feature extraction method to characterize their action rules. Section 2.1 illustrates the three-dimensional data of the hand skeleton as obtained using MediaPipe. To calculate the range and velocity changes of finger tapping, this study selects the three-dimensional data of joint 4. THUMB_TIP and joint 8. INDEX_FINGER_TIP, which are J 4 = {x 4 , y 4 , z 4 } and J 8 = {x 8 , y 8 , z 8 } respectively. By calculating the Euclidean distance between the two joints, the fingertip distance data of frame t is obtained and represented as follows.
D t is extracted based on the pixel distance of the image. Since the change in shooting distance and camera jitter may cause errors, this study uses Z-score standardization processing to get the standardized data. First, the and of D t are calculated.
(1) Then, the standardized data of frame t is expressed as The one-dimensional sequence data of the k-th video is expressed as R k = S k 1 , S k 2 , … , S k T , that is, the final model input data shown in this paper, and time series classification has always been a classical task in deep learning. The feature data extracted in this paper accords with the law of time series data, with prime potential by using CNN processing.
To test the variation law of the range and velocity of finger tapping separately, the range and velocity data of ten fingers tapping are extracted to verify the experimental results. The range and velocity are expressed as A k and B k , respectively, as follows. (2) We used a i and b i to represent the range and velocity of the i-th action, i = 1, 2, … , 10 , as illustrated in Fig. 4, m i to m i+1 is a complete action cycle.m i and n i represent the maximum and minimum of the i-th action, respectively. Therefore, t m i+1 − t m i represents the velocity of the i-th action.

Model Framework
The modified FCN model framework is illustrated by Fig. 5.
As a sequence classification model, the input of the model is a one-dimensional sequence data, and the overall structure is composed of three convolutional layers and a global average pooling (GAP) layer. The convolution layer is used for feature extraction, each layer of its output connects a batch normalization and the ReLU activation function. The GAP layer is used for classification, followed by the Softmax activation function. Our proposed model uses the global average pooling layer instead of the full connection layer, which accepts the sequence of any dimension, to better correspond to the category of the feature map of the last convolutional layer. This achieves accurate classification results. The model is modified according to the characteristics of the self-made dataset. Details of the modified model are presented in Table 1.

Experiment and Results
This section mainly presents the experimental process and results, and the experimental preparation introduces the datasets used in this paper and the main indicators for measuring the performance of the CNN. The experiment consisted of four parts. The first part is a behavior recognition experiment. the classification of finger tapping is an action recognition task, and the shortcomings of action (4) A = a 1 , a 2 , … , a 10 = m 1 − n 1 , m 2 − n 2 , … , m 10 − n 10 Fig. 4 Schematic diagram of the finger tapping data

Experimental Preparation
The dataset collected from Zhejiang Hospital in China contained 252 data of 120 people. The dataset is a video of finger tapping of both hands by each patient. All data included were obtained following written consent by the patients. The sex and age of the patients are presented in Table 2. The subjects in this study were patients with PD and normal controls. However, abnormal finger tapping test not only occurs in PD, but also in essential tremor or psychogenic dyskinesia. The study did not further classify these diseases, and we will make improvements in future works. All the video data used in this research were recorded on ordinary smartphones, and the video frame rate was 30 frames per second (FPS). We clipped the video of the finger tapping of both hands, respectively, and obtained 252 videos, with each containing 10 or more finger tapping. According to the evaluation criteria of the MDS-UPDRS scale, the truth labels of the finger tapping test in patients with PD is divided into five levels: 0 = normal, 1 = slight, 2 = mild, 3 = moderate, and 4 = severe. The difference in adjacent scores was hard to determine. Therefore, the true scores were evaluated by Parkinson's disease subspecialists with rich clinical experience. The score distribution is represented in Table 3. Due to the small number of critically ill patients in the hospital, there were only two cases classified as 4. In subsequent research, we will continue to provide additional patient data.
The hardware environment used to test the proposed algorithm is an Ubuntu18.04 computer with an NVIDIA GeForce GTX 1080Ti (11 GB). Our preprocessing method used opencv4.4.0 to process data, and the model framework is built on tensorflow1.11.0 platform. Four indices; accuracy, precision, recall and f1-score were used to measure the effect of classification and recognition. Accuracy indicates the proportion of correct results predicted by the model to the total observed values; precision represents the correct proportion of the results, for which the model prediction is a positive example; recall is the proportion of the correct prediction results in the samples, for which the real situation is a positive example; F1-Score is the harmonic average of precision and recall ranging from 0 to 1. The higher the value, the more accurate the output of the model. The calculation methods of the four indicators are as follows:

Action Recognition-Based Experiments
Considering that the finger tapping test of patients with different severity of Parkinson's disease have different exhibition and this study regards the classification of finger tapping test as an action recognition task and evaluates and classifies patients with PD by analyzing action differences. To test the effect of the action recognition method, this study uses a mature action recognition classification model such as the two-stream model and 3DCNN. The 3DCNN model input video or image frames directly without preprocessing, while the two-stream model inputs image frames and optical flow frames. This paper extracts all video image frames and twodimensional optical flow frames, as illustrated in Fig. 6.
The experimental results are illustrated in Fig. 7, which are Two-stream Fusion [32] and R2 + 1D + BERT [33]. In the training process of the model, we observed that the training processes of the two models are not convergent, and the method based on action recognition does not solve the finegrained task of the finger tapping test. The non-convergence of training was primarily caused by the minimal difference between the different finger tapping scores. The high finegrained task requires a large number of training datasets. However, the collection of datasets has been a challenge in the medical field. It is a challenge to collect enough data to support the training of complex models in medical research. Moreover, the structure of the action recognition model is

Experiments Based on Manual Features
To make up for the difficulty of dataset collection in medicine, starting with the idea of combining manual design action features with CNN, a feature based on the law of finger tapping test is developed. According to the method illustrated in Sect. 3.2, we extracted the manual feature data of 252 patients with PD for training. For our proposal, the batch size of the training process is 16, the number of iterations is 1000, and the learning rate is set to le-7, training process as illustrated in Fig. 8. From the figure, we see that the model training converges successfully.
To obtain reliable and stable model accuracy from the small number of data sets in our proposal, the fivefold cross validation is applied, often used for the model training of small-scale data sets as it optimizes the evaluation and selection process of the model. The fivefold cross validation method is illustrated in Fig. 9, which divides the dataset into five equal parts, uses the first fold as the test set and the other folds as the training set to achieve precision. the second fold is used as the test set and other folds as training set, to get a total of five precisions five times, and averages them to obtain the model's accuracy. In Table 4, we get a classification accuracy of 79.7%, which is the optimal accuracy for the research based on the finger tapping test. Table 5 illustrates the results of the four indicators, which are also obtained from the fivefold cross validated data. The overall results of the indicators reflect the excellent performance of this model. Figure 10 is a confusion matrix based on the classification results, the error recognition rate of each score in the chart is acceptable within the range, and the proportion of prediction errors reflected in the adjacent scores, demonstrating that the displays robustness.

Comparison Experiment
The experimental method presented is based on manual feature data. As a new method of PD evaluation, there are fewer experimental results to compare. Table 6 illustrates that the accuracy of this method is significantly higher than that of other methods. This is achieved through comparing the proposed method with existing optimal models based on sensor and skeleton data. Additionally, the model used in this paper, as a model for a time series classification, has advantages in the number of parameters and FLOPS compared with other models.
Based on the dataset, we also compared different time series models [37]. The model obtains the best results, as illustrated in Table 7. Compared with the Resnet model, our proposed model has fewer parameters and obvious advantages in terms of computing resources. Table 8 illustrates the comparison of the data results of the proposed model and other works. Compared with the research work based Fig. 8 In the training process of the model; A for loss, and B for the accuracy Fig. 9 Schematic diagram of fivefold cross validation on videos, our method obtains higher accuracy with less sample size, which reflects the advantages of our proposal in the case of small sample size data.

Feature Analysis Experiment
Previous experiments have verified the reliability of this method, which demonstrates that our manual feature design is reasonable and effective, and the range and velocity of finger tapping provide key information for PD evaluation. To further verify the role of range and velocity information in the evaluation of PD, we carried out several experiments. First, based on the manual feature data of finger tapping, we carried out an unsupervised clustering experiment [39]. The results are illustrated in Fig. 11. Since the data of patients with a score of 4 in this dataset is too small and to reduce its interference on the experiment, we excluded the score of 4 data and carried out four kinds of clustering experiments.
Although the classification accuracy of unsupervised experiments is not high enough, we can observe that there is a close correlation between the distribution of the four   1 3 types of data in Fig. 11 and the evaluation criteria in the MDS-UPDRS, and the action rules of the four types of data are consistent with those of the four scores. As shown in Fig. 11, the range and velocity of finger tapping of (A) are more uniform, and the ranges of (B), (C) and (D) increasingly fluctuate. On this basis, the paper extracts the range and velocity information from the feature data for experiments, that is, the A k and B k obtained in Sect. 2.2.2. The classification results obtained by inputting A k and B k into the FCN model are presented in Table 9. It can be found that the accuracy of range data is much higher than that of velocity data, and the experimental results of range data conform to our expectations, but the classification performance of the velocity experiment is poor.

Discussion
In this study, we extracted self-designed manual features to evaluate the finger tapping test of patients with PD, and preliminarily prove the feasibility of the finger tapping test for PD evaluation. The conclusions are obtained on the basis of the experimental results of our self-built dataset. This proposal applies an open-source code for the experiment to allow other researchers to study and verify the results. First, the action recognition experiment based on the original video data is undertaken. Currently, the model training of the mature action recognition model does not converge, as it is difficult to collect the data in medical research. The training process of the action recognition model requires several datasets, and this method does not solve the high fine-grained task of finger tapping without additional detailed features, which also proves the necessity of extracting skeleton information and designing manual features.
Based on the action evaluation standard of finger tapping in MDS-UPDRS, we design a manual feature and conduct the training using the feature data, and this step requires a small amount of data, and effectively solves the problem of the method of action recognition depending on the volume of data. Based on comparative experiments of other studies, especially research work based on videos, our proposed method has obvious advantages based on small sample data sizes. To further verify the rationality of the feature design, we first carried out an unsupervised clustering experiment based on the feature data and found that the clustering distribution of the feature data is consistent with that in the MDS-UPDRS, demonstrating that the feature we designed reflects the motor dysfunction of patients with PD, and the method adopted in this paper is reasonable.
To verify the key information in the clustering experiment, we extracted the range and velocity feature data for the experiment. The range feature experiment conformed to our expectation, but the velocity feature experiment performance was poor. To explain this, we drew a line chart with different fractional ranges and velocities for analysis by random sampling. We sampled 5 data for each fractional segment to facilitate observation, and the specific results are illustrated in Figs. 12 and 13. As shown in Fig. 13, the range value fluctuation gradually increases with an increase in the score, showing a downward trend in line with the explanation of the scale. However, no obvious rules are found in the velocity data. We analyzed the methods for collecting the velocity data and found that the main cause of the error was collecting the data without using professional equipment. However, the FPS of smartphones is low, and the velocity information of captured videos is heavily lost leading to unsatisfactory results. We will continue to make  1 3

Fig. 12
Range data diagram of four fractions, the abscissa represents 10 actions, the ordinate represents the range, and the size is 0-1 Fig. 13 Velocity data diagram of four fractions, the abscissa represents 10 actions, and the ordinate represents the velocity improvements in subsequent experiments as we continue to analyze the velocity changes in finger tapping. In summary, the method designed in this paper has achieved optimal results. Several experiments were set up to verify the rationality, reliability and accuracy of the method, and the experimental results fully discussed and analyzed. However, the experiments face challenges, such as lack of datasets and data imbalance. For example, there are only two cases of data with a score of 4 in the finger tapping experiment, while we needed four types of data in many experiments leading to the incompleteness of our experiment. We will continue to collect additional data to make the experiment more complete.

Conclusion
A method combining manual features with CNN is proposed for the evaluation and classification of finger tapping results for patients with PD to provide reliable MDS-UPDRS scores. With the assistance of AI technology, our proposed method allows users to perform daily assessment of the motor function of patients with PD by simply capturing video data (using ordinary smartphones) and achieve "quantifiable" and "refined" finger tapping scores. Realizing the advantages of early screening of Parkinson's disease, and improving the efficiency of diagnosis and quality of treatment.
The proposed evaluation method achieves an optimal accuracy of 79.7% on a self-built dataset, providing a new idea for PD action evaluation. Through several experiments, we prove that the manual feature we designed meets the scoring criteria of MDS-UPDRS and verifies the rationality and reliability of this method. The method, without wearing additional sensors by patients, has clear advantages in remote clinical evaluations. Moreover, the manual feature we developed is suitable for small sample classification, which effectively solves the difficulties experienced with data acquisition in medicine. In the future, we will conduct research and analysis to further verify and improve the method proposed in this paper.