Classify Elderly Pain Severity from Open-Source Automatically Video Clip Facial Action Units Analysis. A Study from the Integrated Pain Artificial Intelligence Network (I-PAIN) Data Repository.


 Background: It is increasingly interesting to monitor pain severity in elderly individuals by applying machine learning models. In previous studies, OpenFace© - a well-known automated facial analysis algorithm, was used to detect facial action units (FAUs) that initially need long hours of human coding. However, OpenFace© developed from the dataset that dominant young Caucasians who were illicit pain in the lab. Therefore, this study aims to evaluate the accuracy and feasibility of the model using data from OpenFace© to classify pain severity in elderly Asian patients in clinical settings.Methods: Data from 255 Thai individuals with chronic pain were collected at Chiang Mai Medical School Hospital. The phone camera recorded faces for 10 seconds at a 1-meter distance briefly after the patients provided self-rating pain severity. For those unable to self-rate, the video was recorded just after the move, which illicit pain. The trained assistant rated each video clip for the Pain Assessment in Advanced Dementia (PAINAD). The classification of pain severity was mild, moderate, or severe. OpenFace© process video clip into 18 FAUs. Five classification models were used, including logistic regression, multilayer perception, naïve Bayes, decision tree, k-nearest neighbors (KNN), and support vector machine (SVM). Results: Among the models that included only FAU described in the literature (FAUs 4, 6, 7, 9, 10, 25, 26, 27 and 45), multilayer perception yielded the highest accuracy of 50%. Among the machine learning selection features, the SVM model for FAU 1, 2, 4, 7, 9, 10, 12, 20, 25, 45, and gender yielded the best accuracy of 58%. Conclusion: Our open-source automatic video clip facial action unit analysis experiment was not robust for classifying elderly pain. Retraining facial action unit detection algorithms, enhancing frame selection strategies, and adding pain-related functions may improve the accuracy and feasibility of the model.


Introduction
Pain severity is crucial information for pain management decision-making. However, the accuracy in the assessment of older patients is still challenging. While self-reported pain ratings are the gold standard, elderly patients have limited cognitive and physical functions, making it more di cult to assess pain. In addition, the COVID-19 situation and shortage of caregivers make hospital visits more complicated. Therefore, home-based pain management becomes essential for these patients. Integration of machine learning that can provide automated and continuous pain monitoring at home could be the ideal solution.
However, there are potentially complicated objective pain measurements, such as fMRI and heart rate variability. These are not feasible for the actual clinical setting. Facial expression is an accessible, informative feature associated with pain severity, but several limitations inhibited real-life application.
The original facial activity measurement, the Facial Action Coding System (FACS), [1] needs manual coding, which is both timely and costly. There were attempts to develop an automated facial analysis algorithm to overcome this limitation. One of the well-known algorithms is OpenFace© [2]; anyway, it was trained by a dataset dominated by a young, healthy Caucasian person. The widely used UNBC McMaster Pain Dataset has conducted illicit pain in shoulder pain patients while being videotaped in a laboratory setting [3]. Therefore, 'wrinkles' in the elderly [4], cognitive impairment [5], and gender-or ethnic-related fairness of skin[6] might cause the representative bias of the model. Furthermore, in the regular clinical environment, chronic pain patients usually have persistent pain accompanied by spontaneous pain without physical pain stimuli. While taking everyday life, the facial angle and lighting are hard to set in perfect position as a laboratory setting. These factors may impact the performance of the model.
OpenFace© [2] is the deep neuronal network algorithm that reports 18 out of the original 46 AUs. It cannot discriminate AU24-27, which uses the same muscle as AU23, and AU43 (eye closure) from AU45.
The OpenFace outputs values for each AU, indicating the algorithm's con dence in the AU being present.
The per-frame label representing an integer value from 0 to 5 is estimated from either a classi cation or a regression. The data from computer vision detect FAUs into time series of continuous values. Previous studies applied two methods for achieving point estimation from time series for each AU. The rst method uses mean measurements for different AUs [7], while the second method uses time series by calculating the area under the curve of the pulse of FAUs [8]. The accuracy of OpenFace© compared to human FACS coding is 90% for constrained images and 80% for everyday life images. However, the validity of OpenFace© in detecting a face in elderly individuals and dementia is still problematic. One study using manual code FACs found that OpenFace has a precision of AU4 of only 54% and AU43 of 70.4% [5]. Meanwhile, the Delaware database project applies OpenFace© in an ole population younger than 30 years with a higher proportion of nonCaucasian ethnicity and found that the precision of AU4 is 98% for AU4 and 73% for AU45 [9] De ning pain-related facial action units from computer vision automated facial coding is the most challenging area. The benchmark study on pain-related facial action units by Prkachin et al. uses picture frames of 129 people with shoulder pain and rates facial pain during illicit acute pain by motion [3]. They adopted Ekman's coding and utilized four certi ed human coders who provided 1-5 levels of pain intensity (1 = trace to 5 = maximum) for each picture frame. They summarized six AUs signi cantly associated with pain, including brow lowering (AU4), orbital tightening (AU6 and AU7), levator contraction (AU9 and AU10), and eye closure (AU43). The prediction model was proposed by the sum of AU4, AU6, or AU7, whichever is higher, AU9 or AU10, whichever is higher, and AU43 for each AU (Pain = AU4 + AU6 or AU7 + AU9 or AU10 + AU43) [7]. A recent systematic review summarized the AUs consistently reported as related to pain; these comprised AU4, 6, 7, 9, 10, 25, 26, 27, and 43 [10].
With these overall knowledge gaps, it is still questionable whether the machine learning model from the currently available automated facial analysis can be used as decision support. Therefore, this study aims to evaluate the accuracy of the model using data from OpenFace© to classify pain severity in Asian elderly individuals who receive chronic pain care.

Study Design
This is a prospective registry building and analysis of facial expressions in elderly Thai individuals with chronic pain. The research project was approved by the Chiang Mai University's Institutional Review Board (CMU IRB no 05429), and the infomred consent for participation was obtained from the patients themselves or the designated caregiversfor both study participation and publication of images in an online open-access publication. All research was performed in accordance with the IRB regulations.

Study population
Cases were collected from the pain clinic, internal medicine ward, orthopedic ward, and nursing home institute of Chiang Mai University Hospital during May 2018-December 2019. Patients aged more than 55 years old have a history of chronic (more than three months) and ongoing pain during the assessment.
Patients or caregivers able to communicate in the Thai language were included in this study.

Data collection
Research assistants enrolled participants on weekdays between 9 AM and 4 PM. The data collection form consisted of demographic questions. The cognitive status of each patient was evaluated using the Minimal Mental Status (MMSE) Thai 2002 [11]. The facial expression data were recorded by phone camera for 10 seconds at a 1-meter distance. A patient who could communicate was asked to give pain information just before starting the video clip recording. The pain information included pain site, quality, and self-rating severity by the visual analog scale and Wong-Baker Face Scale. We recorded video noncommunicable pain during bed bathing, moving, or measuring blood pressure while assuming illicit pain behavior. Both video clips of communicable and uncommunicable patient research assistancetrained PAINAD were assigned to rate the video clip. The Pain Assessment In Dementia (PAINAD) is a simple score based on ve pain behavior observation domains, including breathing, negative vocalization, facial expression, body language, and consolability [12] Importing and preparing the dataset The videos from the mobile camera feed were then manually trimmed to remove noise (e.g., frames where patients were talking). OpenFace© automatically coded the video clips into 18 facial action coding system-based AUs. The data are kept as an I-PAIN data repository for further research.
As the target class variable of this study, pain severity was de ned by a self-rating, WBS. Mild if 0 and 2, moderate for 4 and 6, and severe for 8 and 10. For uncommunicable patients, their pain levels were de ned using the PAINAD rating by a trained research assistant. One-hot encoding [13]was applied as the premodeling process. Mild for 0-1, moderate 2-4, and severe if ve or more. [14]. The facial action unittime series data of each action unit that represented the facial movement over time from each patient generated by OpenFace© were transformed into two forms: 1) the average movement intensity [15] and 2) the area under the curve (AUC) around the maximum peak [8]. The AUC of each action unit was calculated using the data from 22 frames (0.03 seconds per frame) around the maximum peak. These two data sets were then explored if the action units were related to the pain severity levels and could be used as the attributes to classify the pain intensity. Figure 1 outlines the entire data processing. Demographic data, such as age, gender, and dementia, were identi ed as missing values. One case was deleted due to a lack of information about MMSE. Age was categorized into four classes: less than 60, 61 to 70, 71 to 80, and more than 80. Gender was coded 0 for female and 1 for male. Dementia was classi ed by MMSE cut point, score 18 or below for those who only completed a lower education and 22 for those who completed higher education. Classi cation models and models evaluation WEKA software [18] was used for the data mining process. A total of 255 samples were split into training and test datasets at 70:30 (180:75). The synthetic minority oversampling technique (SMOTE) was applied for the imbalanced data. Tenfold cross-validation was applied for the attribute selection process.
The models were built using ve commonly used classi cation machine learning techniques, including generalized linear model, multilayer perception, which is a subtype of the arti cial neural network, J48 decision tree, naïve Bayes, k-nearest neighbors (KNN) -in this study, the optimized K number = 10, and a sequential minimal optimization support vector machine. In the model evaluation, ten iterations of 10fold cross-validation were used in each data set, and the overall classi cation accuracy percentage was compared between the models.  The network plot of each AU was examined. The results indicated that they were not independent, and when grouped by a correlation of 0.3 or more, they appeared close to the grouping of facial anatomical movement, as shown in Figure 2. The closely related AUs, called coexisting AUs, are grouped. It should be noted that the area under the curve approach provided a more accurate grouping than the average approach (e.g., AU1, AU2, and AU5 are anatomically closely related, and they were well acknowledged as coexisting from a previous study [19].

Descriptive analysis
The box plot comparing FAUs from the average activity approach and the area under the curve approach classi ed pain as mild, moderate, and severe, as shown in Fig. 3. ANOVA yielded signi cant differences for the average activities of AU4 (p=0.04), AU7 (p=0.005), AU10 (p=0.03), and AU25 (p=0.005) from the results of the area under the curve approach, which identi ed AU17 (p=0.07) and AU23 (p=0.0045).
Since a dependent relationship was discovered between each FAU, it may imply that regression may not suit pain severity prediction. However, there was a relationship between the average activity of AU4, AU7, AU10, and AU25 and the area under the curve of AU17 and 23 as well as dementia and sex and pain severity. Therefore, these features were included in the classi cation model.

Classi cation models and models evaluation
There are two sources of features to build pain severity classi cation models: pain-related AUs consistently found in previous studies (AUs 4, 6, 7, 9, 10, 25, 26, 27, and 45) and selection by machine learning. Features selected using each machine learning method are demonstrated in Table 2, and the accuracy of each machine learning model is illustrated in Table 3. It can be concluded that machineselected features provide the best accuracy. The SVM model for average activities of AU1, 2,4,7,9,10,12,20,25,45, and gender yielded an accuracy of 58%. The second most accurate model is the KNN model, which measures the area under the curve of AU1, AU2, AU6, AU20, and females, which provided an accuracy of 56.41%. Among the features selected from pain-related FAU in previous studies, the most accurate models are multilayer perception (50%) and KNN (44.87%). Table 2 The features selected by each machine learning method  Table 2 6. Difference based on classi cation technique see in Table 2 The confusion matrix between pain classi ed mild, moderate, severe by either WBS or PAINAD (actual severity) with the SVM and average value from Open Face© model found that the misclassi cation of moderate pain is higher than mild or severe pain. The ROC areas of mild, moderate, and severe pain were 0.514, 0.408, and 0.496, and the F statistics were 0.651, 0.333, and 0.560, respectively, as described in Table 4. Table 4 Confusional matrix between actual severity and model classi cation for the task classify pain severity.

Discussion
The main ndings for chronic pain in elderly Asian individuals are that the classi cation pain severity (mild, moderate, severe) model developed from OpenFace shows no robustness. The best model is SVM for average activities of AU1, 2, 4, 7, 9, 10, 12, 20, 25, 45, and gender yielded the best accuracy, at 58%.
There are three possible explanations and solutions to improve the accuracy of the model. First, the limitation of OpenFace© in detecting facial action units in painful elderly individuals. Second, the correlation between facial expression and perceived pain severity in chronic pain patients may not be very high. Third, the point estimation method may not be optimized.
We de ned the pain severity-related facial action unit in two ways. The AUs described 'consistently painrelated from the systematic review [10] and machine learning selection. We consistently overlap AUs from these two methods, including AU4, AU7, AU10, and AU45, already described in human coding studies [3].
However, it is noted that the machine learning technique also chooses AU1, AU2, and AU20, which are de ned as fear expressions, and AU9, 10, 17 have been described as related to the disgusting face. This may support the 'total pain' theory that chronic pain comprises complex emotional components. [26] This machine learning also showed the contribution of gender and dementia but not increasing age to the model. This nding is consistent with previous studies that indicated that gender in uences pain severity expression in terms of more expressive[8] and possibly fairer skin, which in uences model accuracy in facial landmark detection [6]. Previous studies have shown that elderly individuals with dementia tend to show more activity in the mouth area than in the upper part of the face [21].
The confusion matrix indicates that higher misclassi cation in moderate pain than obvious mild or severe pain. This nature of the pain classi cation model has been described in A previous study on the accuracy of OpenFace© to classify pain severity from UNBC McMaster. [15]. Therefore, it may not be feasible to adopt the current automated pain severity classi cation model for critical decision-making, such as adjusting the dosage of opioid analgesics. On the other hand, it may be more suitable to augment gross triage tasks, such as supporting evidence of self-rating severity.

Strengths and limitations
To our current knowledge, this study is one of few extensive facial recognition studies conducted in the Asian elderly population. This study is also conducted in a natural setting in which stakeholders bene t from the solution. The information from our research may ll the representative bias of the current model.
It also directly answers that currently available open-source facial analysis software and classi cation models still need reasonable improvement accuracy to classify elderly pain.
The major limitation of this study is that we did not yet have the validity of the facial landmark from OpenFace© compared to the standard human coding. This was caused by limited personal resources that were available for training at that time. This defect may be questionable if the reported AUs in this study are the same anatomical landmark as those described in the standard FACs.
The second limitation is the strategy to estimate value from the time-event series produced by OpenFace©. From a previous study, the video clip of UNBC McMaster has 80% off frames and shows no pain. [22] Therefore, the average method may in uence 'no pain' frames vary person by person. The area under the curve approach, in which the activity also correlates well with facial muscle anatomical movement. may mitigate this effect. Unfortunately, our experiment shows no advantage of this approach. The minor limitation is that the data collection process lags time after the patient's self-report and starting video. Although it is brief, this may decrease the association between the patient-rated and the face in the video clip.
Implication and suggestion for further research The next step for this project is to collaborate with a computer engineer to improve the algorithm. The trained human coders will validate the facial landmarks of the new algorithm. We also create a collecting data application that silently captures patients' faces while sliding the pain scale and may apply the appearance action model to select a 'possible pain' frame. We may focus on the most consistent painrelated AUs, such as AU4, AU7, AU10, and AU45, to develop real-time sensing and predict extreme pain severity (e.g., no and severe pain) to check if the patient pain self-rating is valid. This will help gather more information along with clinical usability. The training data collection set examined the clinical outcome of using an automated pain severity monitoring platform to support decision-making regarding pain medication.

Conclusion
Our open-source automatic video clip facial action unit analysis experiment was not robust to classifying elderly pain. Retraining facial action unit algorithms, enhancing frame selection strategy, and adding pain-related functions may improve the accuracy and feasibility of the model.

Data accessibility
The CSV le can be downloaded at https://w2.med.cmu.ac.th/agingcare/indexen.html Flow to identify the pain intensity-related facial action unit The network plot of the AUs from the average AU (left hand) and the area under the curve: AUC (right hand).