Augmentative and Alternative Communication describes ways to support, brace, or amplify the communication of people who are not verbal communicators. Professionals with expertise in AAC, like Occupational Therapists, Speech and Language Pathologists, and Rehabilitation Technologists, use these unique AAC projects. For an AAC device speech recognition, speech reconstruction and speech generation play a vital role in helping speech-disabled persons to communicate. The results of the review findings have been categorized under various parameters which includes input and output that is sub-categorized as speech recognition, speech reconstruction and speech generation that speaks about the different types of input and output components that can be and that are used in the existing solutions, followed by the user background which states about the different age group of people who uses the AAC devices that varies from partially speech-disabled to fully speech-disabled, and infrastructure that describes about the hardware and the software components that are used to build the various web-based or mobile-based solutions along with its cost effectiveness of the solutions.
5.1 Input and Output:
An input/output device is any hardware that allows a human operator or other systems to interface with a computer. Input/output devices, as the name implies, are capable of delivering data (output) to and receiving data from a computer (input). The input devices can be a touch screen, camera, keyboard, mic, etc., whereas the output device can be a monitor screen, speaker, a printed paper, etc. The following AAC devices that are mentioned below are the mobile-based and the web-based devices. For the mobile-based device the screen of a smartphone or tablet computer is an input and output device. The input on the screen is through touch, and it outputs through graphics on the screen. Whereas for the web-based device the input device can be a camera, where it can capture the data and process it to the processor of a system or the keyboard where the input can be given as a typed text. And the output of the web-based can be the monitor screen of a system of the printed paper copy from the system. Here are some of the different AAC devices where these kinds of input and output components are being used.
Caron et al. (2017), suggested a few AAC apps like GoTalk Now, AutisMate, and EasyVSD that are used by adults with speech-disabled have pictograms-based touch responses and the mobiles keyboard request as the input mode and the visual scene display as their output method. It is noted that there is no regional-based pictograms are used.
Holyfield et al. (2020), propose that AAC technology with dynamic text for the single-word recognition of adults with intellectual and developmental disabilities like down syndrome, ASD and cerebral palsy; uses the dynamic display of text in conjunction with voice output as the output.The generalization of text communication is being used.
Vlachou, J. A., & Drigas. (2017), recommends Mobile Technology for Students & Adults with Autistic Spectrum Disorders (ASD) with the iPad touch response and the speech generating device for various purpose. Like as communication, academics, and entertainment.
Nakkawita et al. (2021), introduces the Proloquo2Go and Speak for Yourself mobile applications for persons with aphasia. These applications use typing and the pictogram-based touch response as input modes and speech as the output.
Janice et al. (2019), insist on visual scene displays (VSD) or grid-based AAC apps with the transition to literacy (T2L) supports (i.e., dynamic text paired with speech output upon selection of a hotspot) for individuals who require AAC who are preliterate.
The following Fig. 4 depicts the existing mobile applications and their input methods. Where we can see the collection of pictograms of how it is being used. The use of regional and conceptualized pictograms is absent in these applications.
5.2 Algorithms:
An algorithm is a predetermined process for resolving a clearly stated computing issue. All facets of computer science, including artificial intelligence depend on the creation and analysis of algorithms. A finite set of instructions followed in a certain order to complete a task is known as an algorithm. In the following session algorithms are categorised under different process like speech recognition, speech reconstruction, and speech generation. Some of the mentioned algorithms are mainly called as Machine Learning algorithms like Hidden Markov Method (HMM), Natural Language Processing (NLP), Case-Based Reasoning (CBR), Augmented Reality (AR), Surface ElectroMyoGraphy ( sEMG), Random Forest (RF), Support Vector Machine (SVM), Dynamic Time Wrapping (DTW), Mel Frequency Cepstrum Coefficients (MFCCs), Principal Component Analysis (PCA).
(a). Speech Recognition:
E. Essa et al. (2008), I. Mohamad et al. (2016), and M. Rahman et al. (2018) have proposed a speech recognition system based on a Support Vector Machine (SVM) with the assistance of Dynamic Time Wrapping DTW for speakers of the isolated Bangla language, isolated Arabic words, and for voice-matching. The data has been collected from 40 speakers. The Mel Frequency Cepstrum Coefficients (MFCCs) were used as the static features of the speech signal. And they used the DTW algorithm after determining the feature vectors for the feature matching. They proposed a model that tested 12 speakers, and 86.08% was the recognition rate that the system achieved. Principal Component Analysis (PCA) is also been used for the extraction of images and MFCC for the extraction of voice inputs. The system achieved an accuracy of 87.14% with the DTW method and 92.85% with the Euclidean distance method. The system could achieve very high rates of speech recognition for isolated Arabic words of up to 96% using MFCC.
Y. Mittal et al. (2015) worked (on a voice-controlled smart home based on an IoT connection that could be installed easily and at a low cost. The authors suggested that using wireless technology can further be used for better performance. Withal, their solution suffers from authentication lockage and distance problems; the closer the user is to the sound device, the more suitable the system is. Additionally, the developed mobile application system can solve the distance problem, which requires that users be within a certain distance of the microphone.
Geoffrey et al. (2012), and Mark S et al. (2013), have used a Large Vocabulary Continuous Speech feature using Hidden Markov models. Here, HMM is used to deal with the temporal variability of speech and Gaussian Mixture Models (GMMs) to check how well each state of each Hidden Markov Model (HMM) fits a frame or a short window of frames of coefficients that represent the acoustic input. This gave an accuracy of 88%. The device can be configured to enable the user to create either simple or complex messages using a combination of a relatively small set of input “words”. They have used the MFCC and HMM toolkit with the Baum-Welch algorithm and have got a recognition accuracy of 67%.
The challenges of existing applications that are used by the speech-disabled exhibit that there is no existence of speech as input for communication. This made the way for my research future directions. Table 4 compares the algorithms used by the researcher by giving the recognition rate, and determining which algorithm fits well for speech recognition. It is said that MFCC works well by giving the highest accuracy.
(b). Speech Reconstruction:
Several papers highlight the issues of conceiving VIVOCA (Voice Input Voice Output Communication Aid) for users with severe speech disorders and dysarthria [25–27]. In [28], Hawley et al. present a VIVOCA solution as a form of AAC device for people with severe speech disabilities: the researchers have applied the statistical Automatic Speech Recognition (ASR) techniques, based on HMMs, to dysarthric speakers to produce speaker-dependent recognition models.
W. Farzana et al.(2020), Uchoa et al. (2021), and Neamtu et al. (2019), highlight the use of Machine Learning (ML) and AI in communication in AAC, that an application named LIVOX is a machine learning-based android mobile application to recommend pictograms based on the location and time of the user device. The notable features of LIVOX are an artificial intelligence-based recommendation system by analysing past user data (used item, utilization time, touch time, GPS data, X and Y coordinates on touch screen), along with the random forest classification algorithm and another feature is Natural Language Processing (NLP) for speech recognition. Moreover, a feedback framework which applies to the Case-Based Reasoning (CBR) machine.
Gay et al. (2011) and Costa et al. (2017), explore their research with 1 in 68 children having Autism Spectrum Disease (ASD). Their intention is to develop a task recommendation. The system will use a Case-based Reasoning machine learning technique in order to supplement the child’s regular therapy. Besides the tasks’ recommendation, this application will allow closer monitoring by parents and better coordination with the therapists, contributing to improving the results of the child’s development.
Liu et al. (2017), insist on Augmented Reality (AR), which may be especially effective as it may deliver visual and auditory cues while the user is simultaneously engaged in natural or structured social interactions. Additionally, wearable technologies contain sensors that can be used to record and quantitatively monitor a user’s interaction. Their report highlights the need for further research into the use of augmented reality technologies as a therapeutic tool for people with ASD.
Elsahar et al. (2019), review give a range of signal sensing and acquisition methods used in conjunction with the existing high-tech AAC platforms for individuals with a speech disability, which includes imaging methods, touch-enabled systems, mechanical and electro-mechanical access, breath-activated methods, and Brain–Computer Interfaces (BCI). The listed AAC sensing modalities are differentiated in terms of ease of access, affordability, difficulty, and typical conversational speeds. A revelation of the associated AAC signal processing, encoding, and retrieval highlights the roles of Machine Learning (ML) and Deep Learning (DL) in the development of intelligent AAC solutions.
Geoffrey S et al. (2017), for training phoneme-based recognition models, a unique set of phrases were used for each of the 39 commonly used phonemes in English, and the remaining phrases were used for testing the word recognition of the models based on phoneme identification from running speech. This study provides a compelling proof-of-concept for sEMG-based laryngeal speech recognition, with the strong potential to further improve recognition performance.
(c). Speech Generation:
Clients who understand the language(s) spoken in their communities would arguably benefit from AAC systems that are aligned with these languages. Considerations may include the form of the AAC symbols. For literate clients making use of orthographic symbols the link between the written and spoken forms of the language is typically clear. Multilingual clients would merely need access to the relevant orthographic symbols for the different languages in which they would want to communicate. If access to synthetic speech generation is also desired, speech generation engines in the relevant spoken languages would need to be available. The development of such speech engines is a technically challenging task that has been accomplished for the global majority languages but is still lagging for many minority languages Calteaux et al. (2013); Louw, (2008); Titmus et al. (2016). A language agnostic speaker embedding mechanism for cross-lingual personalized speech generation. The proposed network adopts an encoder-decoder architecture to disentangle the language information from the latent representation via multi-task learning Zhou, X et al. (2021).
Table 4compares the algorithms that are used to recognize and reconstruct speech to build an AAC device. the process flow of speech recognition and speech reconstruction along with the algorithms used under the respective process by reflecting the recognition accuracy rate and the purpose of usage of AAC is correlated.
Process Flow
|
Algorithm Used
|
Purpose/
Medical Condition
|
Recognition Rate
|
Table 4
Comparison of Algorithms used to recognize and reconstruct speech, to build AAC device
Speech Recognition
|
SVM, DTW, MFCC
|
Isolated Bangla speech recognition
|
86.08%
|
MFCC
|
speech recognition system for isolated Arabic words
|
96%
|
PCA, MFCC, DTW
|
Voice matching of Hijaiyah Letters
|
87.14%
|
HMM
|
Voice-controlled multi-functional smart home application
|
88%
|
MFCC
HMM
Baum-Welch algorithm.
|
Portable, voice output communication aid controllable by automatic speech recognition
|
67%
|
Speech Reconstruction
|
HMM
|
Dysarthric
|
Not Found
|
RF, NLP, CBR
|
Non-verbal people
|
Not Found
|
CBR
|
Autism
|
Not Found
|
AR
|
Autism
|
Not Found
|
NLP
|
Non-verbal people
|
Not Found
|
sEMG
|
Learning Disability
|
Not Found
|
Hidden Markov Method (HMM), Natural Language Processing (NLP), Case-Based Reasoning (CBR), Augmented Reality (AR), Surface ElectroMyoGraphy ( sEMG), Random Forest (RF), Support Vector Machine (SVM), Dynamic Time Wrapping (DTW), Mel Frequency Cepstrum Coefficients (MFCCs), Principal Component Analysis (PCA).
5.3 User Background:
The users of these AAC applications vary from infants to adults and from partially speech-disabled to fully speech-disabled people.
An et al. (2017), ten 3–6-year-old children included in their study who had been diagnosed with ASD may or may not be able to communicate using speech or language. Based on criteria in the Diagnostic and Statistical Manual of Mental Disorders, had not been diagnosed with or suspected of having epilepsy, and had never previously used any other speech-generating devices or Picture Exchange Communication System (PECS).
C Wilson et al. (2018), undertook a field study over two school terms in an autism-specific primary school with 12 minimally verbal children aged 5 to 8 with their teachers and speech therapists. In Human-Computer Interaction (HCI), Ability-based Design addresses the importance of interactive technologies which adapt to users’ abilities, skills, and contexts, as opposed to catering solely to their perceived inabilities.
Tariady et al. (2017), the subjects of their research were 12 autistic children in an inclusion school in the Pekalongan region. The experiment method was used with the single-subject research approach. The research concluded that the average ability level in communication before and after the treatment was 47%, while during the treatment the average level was 65%. Whereas there is an improvement after the intervention stage of the PECS method Multimedia Augmented Reality-Based with an average of 76%.
Shelley et al. (2017), motive for this paper is to explore how Speech-Language Pathologists (SLPs) who are AAC specialists approach the assessment process for 2 case studies, 1 child with cerebral palsy and 1 with autism spectrum disorder. The results of the current study provide an outline of an assessment protocol for children with complex communication needs.
Sarah Creer, et al. (2016), this paper provides an epidemiological approach to determine the prevalence of people who could benefit from AAC in the UK. As a result, a total of 97.8% of the people who could benefit from AAC have nine medical conditions: dementia, Parkinson’s disease, autism, learning disability, stroke, cerebral palsy, head injury, multiple sclerosis, and motor neurone disease. The total expectation is that 536 people per 100,000 of the UK population (approximately 0.5%) could benefit from AAC.
The study of AAC on implementation shows that it has been checked with a limited count of speech-disabled people varying from infants to adults.
5.4 Infrastructure:
The infrastructure of an AAC device refers to the digital technologies that provide the basic functionality for a device’s operations. Examples of infrastructure include hardware and software components along with the cost-effectiveness required for the AAC device.
Castro et al. (2018), iCommunicate, which is a Sign Language Translator Application System using Kinect Technology. The C#, Kinect Studio Visual Gesture Builder, and Kinect for Windows V2 are the software and hardware used in this system. Language-Integrated Query (LINQ) makes the strongly-typed query to be a first-class language construct. As an object-oriented language, C# supports the concepts of encapsulation, inheritance, and polymorphism.
Ramon et al. (2017), A device with KY-038 Microphone Sound Sensor Module, WTV020M01 MP3 voice module SD, and Gizduino ATMega 168 with the LCD is implemented, whereas the program is implemented in a microcontroller as software to make the AAC device work.
N.H. Voon et al. (2015), The AutiSay application is developed for the hand-held smart device Apple iPad technology and requires iOS 6.1 or later. This device is equipped with a built-in camera, microphone, and Wi-Fi capabilities, which the application needs for customization and accessing the app store.
Haneen et al. (2015), the application will be downloaded on the teacher's/parent smartphones to use the setting page and also on the children's smartphones to use the pages provided to them by the application. The GPS service provided by the phones will be used to determine the children's location.
M.Manickavelu and Anil, (2013), KAVI-PTS are an AAC application built using the Android OS. It can be installed on any Android tablet or smartphone. It requires 2.3.3 or above Android OS with 512 MB RAM Memory and internet connectivity is preferred to download the application from the server.
While AutiSay: A Mobile Communication Tool for Autistic Individuals is focused on the autistic child, there is significant potential for the application to be scaled for adult autistic individuals N.H. Voon et al. (2015). The application is relatively inexpensive as it is distributed on the Apple App Store for a relatively low price in comparison to similar applications.
Voigt et al. (2014), Delayed Auditory Feedback (DAF) is a mobile application that is developed as an inexpensive app that contains a lot of insights we have gained in our research on stuttering during the last 10 years as the hardware or device expense is minimised.
Muharib et al. (2018), GoTalk Now is a relatively inexpensive application as it is an open-source application. Using iPad-based SGD may be feasible in educational settings. In addition to educational settings, family members may be taught to use an iPad as an SGD to support their children’s communication at home and in the community.
Trishna et al. (2017), speak about Avaz, an iOS-based application that helps autistic kids by providing picture-based communication and natural voices in multiple languages. Roger Voice is a free-of-cost cross-platform application for people with difficulty hearing. Using Roger Voice, people with hearing problems can comprehend phone calls by reading. Vaakya is a free-of-cost Android-based Augmentative and Alternative Communication application which is helpful for people with speech disabilities, especially for autism, cerebral palsy, etc. The application contains a combination of images and phrases, through which users can communicate effectively in the Hindi language.
The above Fig. 5 represents the cost of the existing mobile application of AAC followed by the future directions of research and conclusion.