Individuals with Upper Limb Loss Require Minimal Training to Achieve Robust Motion Classification Using Sonomyography

1 Background: Although surface electromyography is commonly used as a sensing strategy for 2 upper limb prostheses, it remains difficult to reliably decode the recorded signals for controlling 3 multi-articulated hands. Sonomyography, or ultrasound-based sensing of muscle deformation, 4 overcomes some of these issues and allows individuals with upper limb loss to reliably perform 5 multiple motion patterns. The purposes of this study were to determine 1) the effect of training 6 on classification performance with sonomyographic control and 2) the effect of training on the 7 underlying muscle deformation patterns. 8 Methods: A series of motion pattern datasets were collected from five individuals with 9 transradial limb loss. Each dataset contained five ultrasound images corresponding each of the 10 following five motions: power grasp, wrist pronation, key grasp, tripod, point. Participants 11 initially performed the motions for the datasets without receiving feedback on their performance 12 ( baseline phase), then with visual and verbal feedback ( feedback phase), and finally again 13 without feedback ( retention phase). Cross-validation accuracy and metrics describing the 14 consistency and separability of the muscle deformation patters were computed for each dataset. 15 Changes in classification performance over the course of the study were assessed using linear 16 mixed models. Associations between classification performance and the consistency and 17 separability metrics were evaluated using Pearson correlations. Results: The average cross-validation accuracy for each phase was 92% or greater and there was 19 no significant change in cross-validation accuracy throughout training. Misclassifications of one 20 motion as another did not persist systematically across datasets. Few of the correlations were 21 significant, although many were moderate or greater in strength and showed a positive 22 association between accuracy and improved consistency and separability metrics. Conclusions: Participants were able to achieve high classification rates upon their initial 24 exposure to sonomyography and training did not affect their performance. Thus, motion 25 classification using sonomyography may be highly intuitive and is unlikely to require a 26 structured training protocol to gain proficiency.

Individuals with limb loss may be further disadvantaged by motor cortex reorganization 70 following amputation (26), as well as muscle atrophy due to disuse of the residual limb and/or 71 increased reliance on the intact limb (27). Given these difficulties, it is unsurprising that first 72 attempts to use pattern recognition are often error-prone. For example, initial classification 73 accuracies for individuals with transradial limb loss have been reported to range from 46.37% 74 (28) to 77.5% (29). Training over the course of multiple sessions or days appears to mitigate 75 some of these errors for individuals with and without limb loss, regardless of whether feedback 76 on their performance is provided (29-32). These improvements are credited to changes in the 77 EMG signal patterns such that they become more consistent and/or separable, although the 78 correlation between performance and EMG pattern characteristics is complex and not yet fully 79 understood (32). 80 In order to overcome these problems with myoelectric control, some researchers are 81 exploring the use of sonomyography (SMG), or ultrasound-based sensing of muscle 82 deformations. This modality avoids many of the limitations of EMG because it can spatially 83 resolve individual muscles with sub-millimeter precision, including those in deep-seated muscle 84 compartments. As a result, cross-talk is effectively suppressed and the control signals derived 85 from the detected muscle activity have a high signal-to-noise ratio. Prior studies have 86 demonstrated clear potential for the use of SMG in controlling a prosthesis or other human-87 All datasets were collected in succession on a single day without repositioning the 161 ultrasound transducer, except for Am7. He had a longer testing session because he needed the 162 motion performance speed to be slowed from one second to two seconds per cue. As a result, he 163 required a break between collection of the third and fourth datasets and requested to have the 164 transducer removed. Additionally, Am5 terminated the testing session early because of a 165 scheduling conflict before datasets in the retention phase were collected. He returned three days 166 later to complete a full testing session. Data for both sessions were retained for analysis in this 167 case. 168 169

Data analysis 170
The primary outcome metric was cross-validation accuracy (Eq. 1), defined as the percent 171 of data correctly classified during the leave-one out validation process for a given dataset i: 172 where is the correct number of predictions by the closest-class classifier and is 173 the total number of predictions (i.e., the total number of datapoints). 174 Cross-validation accuracy is a combined measurement of the user's ability to perform a 175 motion and the classifier's ability to label individual motion performances. Since user 176 performance and classifier performance are inherently linked in this metric, it is possible that a 177 user's performance could change over time without affecting the cross-validation accuracy. For 178 example, a user may perform the tripod grasp with very little variation for a given dataset, 179 resulting in a high cross-validation accuracy. On the next dataset, they may perform the grasp 180 with two different variations having slightly different levels of middle finger flexion. As long as 181 the closest identified class for each of the variations is still tripod, the cross-validation accuracy 182 would be unaffected. Therefore, in order to more appropriately understand the changes in user 183 performance independent of the classifier performance, we represented the 100 x 140 pixel 184 images in our dataset as points in 14,000-dimensional space such that each pixel in the image 185 corresponds to an axis in the high dimensional space. We can then define point clusters in this 186 high dimensional space such that each cluster is comprised of all the points in an associated 187 motion class. We utilized metrics from the unsupervised learning literature to describe the 188 characteristics of these clusters, and thus of the performances of each motion. 189 The clustering metrics used in our analysis include Caliński-Harabasz (CH) Index (45), 190 the Silhouette Index (46), and the S_Dbw Index (47). The CH Index and Silhouette Index are 191 both commonly used in the unsupervised learning literature, while the S_Dbw Index is less 192 common but has been shown to be more robust (48). Each of these metrics is a combination of 193 some measurement of cluster consistency and cluster separability. As such, we also discuss these 194 constituent components (consistency and separability) as supplementary metrics to better 195 understand inter-motion vs intra-motion behavior independently from each other. If a user's 196 performance of a given motion becomes more consistent with other performances of the same 197 motion, the points in that motion cluster move closer together and the consistency measurements 198 improve. If a user's performance of a given motion becomes more distinct from the 199 performances of another motion, the clusters themselves move further apart and the separability 200 metrics improve. A more detailed explanation of the clustering metrics and their consistency and 201 separability constituents is provided below. 202

CH Index 204
The CH Index is a positive unbounded measurement where higher values indicate more 205 consistent and/or separable clusters. It has been shown to be robust when evaluating clusters that 206 may have varying densities and may be comprised of subclusters themselves, but can be 207 susceptible to errors when evaluating clusters with noise in the data or clusters with imbalance 208 (48). The CH Index (Eq. 2) is defined as a ratio of the variance of the cluster centroids (CH-209 Separability) to the variance within each cluster (CH-Consistency): 210 where N is the number of datapoints, k is the number of clusters, CHS i is the CH-Separability 211 metric for a cluster C i , and CHC i is the CH-Consistency metric for C i ,. CH-Separability and CH-212 Consistency are defined as: 213 where N i is the number of points in C i , is the cluster centroid of C i (average of all points in C i ), 214 is the data centroid or the average of all datapoints, and x is a datapoint in a given cluster, C i . 215 216 Silhouette Index 217 The Silhouette Index is a bounded measurement between [-1 and 1] where higher values 218 indicate more consistent and/or separable clusters. It has been shown to be robust when 219 evaluating clusters with noise in the data, clusters that may have varying densities and clusters 220 with imbalance, but can be susceptible to errors when evaluating clusters that may be comprised 221 of subclusters themselves (48). The Silhouette Index (Eq. 3) is defined on a per point basis and 222 summed across all N datapoints in a given dataset D: 223 where SS(x) represents the separability for a given datapoint x and SC(x) represents the 224 consistency for a given datapoint x. SS(x) is defined as the minimum over clusters of the average 225 distance between x and points in another cluster: 226 where N i is the number of points in C i , k is the number of clusters, and ′ ensures that the SS 227 calculation only considers distances to neighboring cluster and ignore distances to points in the 228 same cluster as x. SC(x) is defined as the average distance between x and points in its own 229 cluster: 230 where N i is the number of points in C i, and k is the number of clusters, and C x is a selection 231 variable to ensure the distances in SC are computed only to points in the same cluster as x. 232 Although the Silhouette Index is defined on a per point basis, we represent the 233 separability and consistency constituents as averages across all clusters. Thus, the Silhouette-234 Separability score SS i is defined as the average separability over all points in a given cluster C i : 235 Similarly, the Silhouette-Consistency score SC i is defined as the average consistency over all 236 points in a given cluster C i : 237 (3.4) The S_Dbw Index is a positive unbounded measurement where lower values indicate 240 more consistent and/or separable clusters. It has been shown to be robust when evaluating 241 clusters with noise in the data, clusters that may have varying densities, clusters that may be 242 comprised of subclusters themselves, and clusters with differences in the number of points in 243 each cluster (48). The S_Dbw Index (Eq. 4) is defined as the sum of a scatter measurement 244 (S_Dbw-Consistency) and a density between clusters measurement (S-Dbw-Separability): 245 where k is the number of clusters, SDC i is the S-Dbw-Consistency score for a given cluster C i , 246 and SDS i is the S-Dbw-Separability score for a given cluster C i . SDC i is defined as the magnitude 247 of the variance of C i divided by the magnitude of variance of the whole dataset D: 248 (4.1) SDS i is defined as the average of a ratio of densities d(m, C ij ): 249 where C ij is the union of clusters C i and C j , is the centroid of cluster C i , is the centroid of 250

Statistical analysis 257
To determine the effect of training and feedback on classification performance, we fit the 258 following linear mixed model: 259 where is the overall cross-validation accuracy for the ith subject on the jth dataset. In this 260 model, 1 is the normalized elapsed time since the collection of the first dataset, 2 = 1 if the 261 jth dataset is in the feedback phase and 2 = 0 otherwise, 3 = 1 if the jth dataset is in the 262 retention phase and 3 = 0 otherwise, is the residual error, and is a random intercept 263 accounting for within-subject correlations among repeated measures. Both and are 264 assumed to be normally distributed and independent. The baseline phase is treated as the 265 reference level. To account for the small sample size and potential violation of the model 266 assumptions, we used the permutation test (49) to assess the significance of the effects of training 267 and feedback on the overall cross-validation accuracy (α = 0.05). 268 To determine the effect of training and feedback on the muscle deformation patterns, we 269 first calculated the average overall cross-validation accuracy and the average accuracy per grasp 270 across subjects for each of the three phases. We then calculated the change in accuracy between 271 the baseline and feedback phases, as well as the baseline and retention phases. Similarly, we 272 calculated the change in the clustering metrics between phases. Finally, we calculated Pearson's 273 correlation coefficients between the changes in these metrics and the changes in accuracy rates. 274 To account for the small sample size, we used the permutation test (49) to test the null hypothesis 275 that there was no correlation (α = 0.05). 276 277

278
Cross-validation accuracy exceeded 76% for all 57 datasets collected across all subjects 279 and was at least 92% for 45 of the datasets. Furthermore, the average cross-validation accuracy 280 for each phase was at least 92% (baseline: 94.4 ± 3.1%; feedback: 95.4 ± 3.6%; retention: 92.0 ± 281 7.1%; Figure 1). The entire data collection session was somewhat lengthy since at least nine 282 datasets per participant were collected, with the exception of the first session for Am5. The There were few significant correlations between changes in cross-validation accuracy and 295 changes in the clustering metrics (Table 3, Table 4). The change in accuracy for the tripod grasp 296 between the baseline and feedback phases was significantly correlated with change in the S_Dbw 297 Index (r = -0.896, p = 0.045) and S_Dbw-Consistency (r = -0.963, p = 0.031), while the change 298 in overall accuracy was significantly correlated with change in the CH Index (r = 0.981, p = 299 0.016). Between the baseline and retention phases, the change in accuracy for the point grasp was correlated with the change in Silhouette-Separability (r = 0.884, p = 0.044) and S_Dbw-301 Although no other correlations were statistically significant, many were moderate or 305 greater in strength (|r| > 0.5). Furthermore, the direction of the correlations was generally 306 consistent with our expectation that improvements in accuracy would relate to improvements in 307 clustering behavior. There were a few statistically insignificant exceptions (Table 3, Table 4), but 308 their relevance cannot be determined based on the current study due to the small sample size. 309 310 Discussion 311 The primary purpose of this study was to determine the effect of training on classification 312 performance. Although we expected that performance would improve with provision of feedback 313 during training, our results did not support this hypothesis. In fact, all subjects were able to 314 achieve successful classification without any instruction right after they started the study 315 undertaking in either case. However, we anticipate that the benefit of undergoing a structured 329 training protocol for learning to generate consistent and separable SMG signals would be low. A 330 reduced need for this initial training could enable patients to devote more resources towards 331 functional training with a physical prosthesis, which may still require involvement from a 332 therapist. Interestingly, preliminary work from our group suggests that the need for functional 333 training may also be reduced for patients using SMG compared to other control strategies. Am3 334 was able to operate a sonomyographic prosthesis and complete a functional task immediately 335 after donning it, despite receiving no specific instructions on how to approach the task. Although 336 Am3 was an experienced user of a direct control myoelectric prosthesis, his performance with 337 the sonomyographic prosthesis was visibly smoother and involved less compensatory movement 338 (Additional File 3). 339 A reduced need for controls and/or functional training may ultimately help diminish 340 barriers to prosthesis access in the United States, where few clinicians specialize in caring for 341 people with upper limb loss or have experience with justifying a course of treatment to insurers 342 (50). For these reasons, it is perhaps unsurprising that one survey found that 35% of those with 343 unilateral upper limb loss received no training of any kind and only 22% received more than 10 344 hours of training from a prosthetist or therapist (51). Unfortunately, therapy is an essential component of the rehabilitation process and the receipt of training to use a first prosthesis has 346 been associated with increased satisfaction (7). With a more intuitive control strategy enabled by 347 SMG, there may be a potential for increased satisfaction without the need for extensive 348 involvement from a therapist. Experiencing an early sense of accomplishment from successfully 349 learning the control strategy may also motivate users to continue practicing with the prosthesis 350 and could reduce the likelihood that they abandon prosthesis use. 351 Although our participants had fairly similar classification performances, it is worth 352 highlighting individual participant performances in order to exemplify some advantages of SMG. 353 In particular, Am5 was fully naïve to the use of SMG during his first data collection session but 354 achieved perfect classification on the first dataset prior to receiving any feedback. He also 355 maintained an average of 99% accuracy across all six datasets. Am7 was also fully naïve to the 356 use of SMG, but had slightly poorer classification performance in comparison to the other 357 participants and required nearly double the amount of time to create each dataset. Based on 358 comments from Am7 and our observation of his SMG signals, it appears that he had a difficult 359 time relaxing his muscles to a "resting" position in between the repeated grasps. Am7 had 360 undergone amputation slightly over a year before participating in this study and was a highly 361 inexperienced myoelectric prosthesis user, having only owned his prosthesis for one week. He 362 had significant muscle atrophy in his residual limb as a result of this disuse, which may have 363 contributed to his difficulties. While this could suggest that having general familiarity with 364 prosthesis use may impact an individual's ability to produce appropriate SMG signals, it actually 365 seems that SMG motion classification can be easily learned even by those lacking prior 366 prosthesis experience. Indeed, Am7 achieved an average cross-validation accuracy of 89% 367 across nine datasets even as a novice prosthesis user.
Our finding that most participants were able to generate the requisite control signals on 369 the very first dataset without being provided any instruction is indicative of the intuitive nature 370 of SMG. Because SMG relies on sensing muscle deformations and these deformations are 371 directly related to the proprioceptive afferents in muscle spindles, SMG control is highly 372 congruent with the underlying proprioceptive sense in the residual limb musculature. 373 Furthermore, the richness of the ultrasound features in high-dimensional space means that our 374 algorithms can more easily distinguish the user's natural motion patterns. Users can therefore 375 rely on proprioception and may not need other cues to monitor their performance. In fact, others 376 have demonstrated that able-bodied individuals can successfully modulate the degree of muscle 377 activation to a desired level even when relying only on the implicit feedback available through 378 proprioception (52). Am8's results provide an interesting demonstration of this concept. Her 379 average cross-validation accuracy was slightly worse during the feedback phase compared to the 380 baseline and retention phases, suggesting that she performed best when relying on her own 381 intuition rather than following explicit instructions. Although Am8 had one exposure to SMG 382 five years prior to the current study, she regularly used a single degree-of-freedom myoelectric 383 hand in her daily life and thus had minimal experience with gesture recognition to guide her 384 performance. 385 The second purpose of this study was to determine the effect of training on muscle 386 deformation patterns. We hypothesized that changes in the patterns would correlate with changes 387 in classification accuracy over training, but we did not see this trend in our results (Additional 388 File 2). Few of the pattern characteristics were significantly correlated with changes in 389 classification accuracy-most notably, changes in the CH Index were correlated with changes in 390 overall accuracy between the baseline and feedback phases. Classification was performed using a 391 modified 1-nearest-neighbor classifier, in which motions are assigned to the nearest class within 392 the high dimensional feature space. The CH Index, a ratio between the average inter-cluster and 393 intra-cluster distances, is effectively a measure of distance between neighboring clusters. Thus, 394 correlation between this distance-based metric and distance-based classification is to be 395 expected. 396 Although most other correlations were statistically insignificant, there are a few 397 interesting trends to note in the magnitude of the correlation coefficients. In particular, the 398 magnitude tended to be slightly higher for the consistency metrics than the separability metrics, 399 indicating that changes in accuracy could be related more closely to greater intra-motion 400 consistency rather than greater inter-motion separability. This would mean that participants 401 became more consistent in executing the grasps but did not actually change how they executed 402 them relative to the other motion. This is well-aligned with the idea that the users may be able to 403 rely on their innate proprioception when using SMG and that SMG is an intuitive control 404 paradigm due to direct relationship with proprioception. In particular, the algorithm is able to 405 decode the user's intent without the user needing to adapt to the algorithm. Nonetheless, it must 406 be re-emphasized that these correlations were nonsignificant and should not be overinterpreted. 407 Taken together, these findings suggest that participants were able to generate separable 408 movements right away and were able to consistently repeat those movements without requiring 409 external feedback. We believe this finding represents a significant advantage over pattern 410 recognition control, which similarly requires that EMG signal patterns are distinct and 411 repeatable. Unfortunately, people do not naturally have experience with modulating EMG 412 patterns to meet these requirements (25), nor is it clear which EMG pattern characteristics are effectively train users on pattern recognition, possibly limiting user motivation or interest in 415 adopting this technology. Delineating the relationship between signal pattern characteristics and 416 classification performance appears to be less critical to an individual's success with SMG than 417 EMG, as users seem capable of achieving successful classification without intervention. 418 There are several limitations to this study. First, the majority of our participants were not 419 fully naïve to SMG prior to participation in this study. It is possible that this prior experience 420 could have improved their performance on this protocol, but we believe this is unlikely given 421 that a minimum of nine months had passed since their most recent exposure. They also returned 422 to using a myoelectric prosthesis during the intervening time, which may have interfered with the 423 retention of any motor learning or skills obtained in previous sessions. 424 Additionally, we tested the classification performance on a limited time scale while 425 subjects remained stationary. It is well-known that EMG classification can degrade in response 426 to changes in arm position, electrode shifting, sweating, muscle fatigue, or during changes in 427 signal characteristics over time (53). Similar issues may occur with SMG classification, which 428 would require users to retrain the classifier after some period of use. Even if these deteriorations 429 in performance occur, it does not invalidate our current finding that users can initially achieve 430 robust motion classification with minimal training. 431 Another limitation of this study was that we utilized a commercially-available ultrasound 432 imaging system with an array transducer. For translation of SMG technology to practical 433 prosthesis sockets, we anticipate utilizing single-element transducers with low power electronics. 434 Our previous work has indicated that the classification accuracy with sparse sensing is not 435 compromised (54). However, this result has yet to be validated in individuals with limb loss. We 436 are currently developing fully-integrated prototype SMG systems and additional studies are 437 planned in the future. It should also be noted that the reported classification accuracies were 438 obtained using a 1-nearest neighbor classifier. We purposely utilized one of the simplest 439 classifiers in an effort to decouple user performance from classifier performance. More 440 sophisticated classifiers, such as linear discriminant analysis commonly used for EMG pattern 441 recognition, are expected to provide improved classification accuracy for SMG data as well. 442 Finally, the sample size for this study was small. This may have reduced our ability to 443 detect statistically significant results, especially for the correlations between classification 444 performance and muscle deformation patterns. More of the correlations could prove to be 445 significant if this protocol was replicated in a larger group of participants. Furthermore, we 446 cannot fully distinguish between the effects of repetitive practice and provision of feedback on 447 classification performance. There could have been an interaction between these factors, but we 448 could not include an interaction term in the linear mixed model due to the small sample size.   Average between-subjects (grey bars) and within-subject (colored bars) crossvalidation accuracy for each phase. Error bars represent standard deviation.

Figure 2
Cross-validation accuracy as function of elapsed training time for individual participants. The three sections of each plot correspond to the baseline, feedback, and retention phases. The break between the third and fourth datasets for Am7 indicates that the transducer was removed and repositioned.

Figure 3
Classification performance across individual datasets for Am3 (top) and Am7 (bottom). The confusion matrices have been adapted to represent the temporal evolution of classification performance across all datasets (55). The squares in each confusion matrix are divided into n columns representing the n collected datasets for that subject. Thus, confusion between grasps for individual datasets is illustrated by the individual columns. For example, power grasp was identified correctly for four out of five motion instances during Am7's first dataset (yellow bar) and was confused with point on one instance (maroon bar).  Table 3. Pearson correlation coefficients between changes in accuracy rates from baseline to feedback and changes in the separability and consistency metrics from baseline to feedback.