A speech prosthetic is essentially a brain to computer interface between the speech areas of cortex and a computer. More specifically, in the iteration reported herein, it records signals from the motor articulatory area of the brain and transmits these neural signals to a computer that decodes the signals to produce speech in real time. There are several other approaches to developing a speech prosthesis. Most recently, Willett et al [1] have recorded from human arm motor cortex and provided a writing facility, easily convertible to speech, using Utah arrays and recording multi-units. Other researchers [2, 3, 4] have used micro-electrode arrays to control paralyzed limbs and robotic limbs, also recording multi-units. ECoG electrodes placed on the cortical surface have been used by Chang and his group [5] to develop a speech prosthesis using the frequency domain to interpret and produce the intended speech. In the present report, the Neurotrophic electrode is used to record single units [6]. A thorough review of these techniques is outside the scope of this paper and is published elsewhere [7].
The locked-in syndrome is consequent on two major conditions: 1] amyotrophic lateral sclerosis is a devastating and slow paralysis; 2] brainstem stroke it is a sudden and complete paralysis. In either case, it leads to loss of use of the articulators, and hence loss of speech, even though the neural substrate in the cortex and sub-cortical areas are intact. These cortical areas are the targets of research efforts to develop a system of stable and long lasting recordings. It follows therefore that only those electrodes that have proven stability and duration should be candidates for human implantation. The discussed systems have been demonstrated to produce functional signals (individual units, multi-units, frequency domain surface recordings) for months and years. Published data on the Utah array, however, indicates loss of 85% of single units after 3 years [8] though anecdotal evidence suggests longer duration of multi-unit signals. ECoG array duration was assessed histologically at 22 months with recordings continuing up to 18 months [9]. Histological analysis indicated giant cells and macrophages at the interface, with encapsulation of the electrodes by collagenous tissue. This histology strongly suggests attempts to reject the implant. A more recent configuration is the placement of multiple gold electrodes on the cortical surface, but no long term data has been produced to validate the longevity of this approach [10].
Another approach to avoid rejection of the electrode is to grow the neuropil inside the electrode tip. Using this approach, data indicate stability of single units up to four years in four participants and nine years in participant 5 in this study [6, 11, 12, 13]. The stability of single unit recordings after five years of implantation was such that the participant controlled movements of a cursor in a 2D formant frequency plane to activate vowel sounds [14, 15, 16], as well as decoding phones and words [6]. Post mortem histological verification 13 years after implantation indicated no gliosis, abundant myelinated axons, and normal neuropil except no neurons (the neurons remain outside the glass tip and grow neurites into the tip under the influence of neurotrophic factors) [17]. The histological analysis result is identical to prior rat and primate studies using light microscopy and electron microscopic techniques [18]. In addition, single units remained functional at year nine [13]. Thus the question of longevity and stability of recorded single units, as well as the lack of gliosis, has been established with the Neurotrophic electrode. So far, no other electrode has been shown to have these capabilities.
Advances in decoding with neural net paradigms takes decoding to another level. These paradigms allow accurate classification of phones as described in this paper and phrases in a prior paper [6]. The classification includes results with slow firing single units as well as fast firing single units. This paper focusses on both fast (5 ips and above) and slow firing units, with particular emphasis on the importance of slow firing single units for improved accuracy of classification. There is no study known to these authors where consistently slow firing single units were specifically related to the task at hand. Some single units are known to cease firing and then burst to a high frequency, but consistently slow, non-bursting units, firing in the 0 to 5 ips range are not known to have been studied, at least with respect to neural prosthetic usage.
In this study, two human participants were implanted with the aim of understanding how to develop a speech prosthetic. The first participant, locked-in due to a brainstem stroke at the age of 16 years, was implanted in 2004. Multiple publications have described the results [6, 11, 12, 13, 14, 15, 16, 17]. Even though this locked-in participant (number 5 in FDA IDE G960032) could control his single unit firings, the technology available at that time was not amenable to the production of speech.
To better understand decoding of single units associated with the timing of audible and silent speech, one author (PK) decided to have his speech motor cortex implanted in 2014. An important result from studies in both the locked-in and speaking participants is that even though fast firing units are the most important, slow firing are important for optimal decoding. This issue is addressed in this paper. In addition, results from conditioning of slow single unit firings illustrate the importance of including slow firing in decoding, so that even if not related to the task they can be conditioned to the task ad thus improve decoding capability.
The late David Jayne, an ALS person and friend, insisted that I focus my efforts on restoration of speech, not movement, because as he said “speech will make me human again. I will talk to my children!”. So the aim of this effort is to restore at least 100 useful words to those who are mute and paralyzed. The expectation is that they speak at a conversational rate, or at least at a rate that allows the individual to hold a conversation.
Participant 5 of FDA study (G960032) gave his permission using up movements of his eyes for agreement and down movement for disagreement with the informed consent regulations of Neural Signals Inc with the regulations being read to him and his father. His parents also agreed to the regulations. Participant 6 was one of the authors (PK).
Strategy
Archived data from locked-in participant 5 and intact participant 6 (PK) are analyzed during silent speaking of phones using MATLAB’s Classification app. This app provides an accuracy rating of the ability to distinguish between individual phones. To test the contribution of fast and slow firing single units to the decoding accuracy, single units are removed from the analysis one at a time and the accuracy is recalculated. In addition, groups of fast and slow are removed and the accuracy recalculated. For conditioning of slow single units, data recorded over several days involved locked-in participant 5 listening to a tone, having a quiet period, and then ‘singing’ it in his head. When feedback was provided, conditioning improved and remained stable.