We report our results in five different sections. In the first section 3.1, we analyzed the included papers with respect to paper counts, authors, regions, languages, and venues. In the following sections (see 3.2,3.3,3.4,3.5), we present our findings based on the research questions addressed in this systematic literature review.
2.1 Paper Counts, Authors, Regions, Languages, and Venues
The final papers selected after the full review were 24 papers from 23 different venues. The number of studies on AI-Based automated speech therapy tools for persons with SSD published shows an upward trend over the years (see Figure 3). Out of 24 papers, we can observe that 20 articles were published during the last 6-7 years.
The majority of the paper included in this study were published in journals (see figure 4). Additionally, there were ten papers published in conference proceedings and two book chapters among the 24 included studies. However, we could not find any eligible studies published in a magazine.
There were 91 unique authors identified from the included studies. The VOSviewer software was used to calculate the most impactful authors, generate co-authorship clusters, and perform co-occurrences of keyword analysis [11]. All the authors were counted irrespective of the authorship order with the same weightage applied to all the authors. However, high weightage was attributed to authors publishing more articles. In addition, to find the list of most impactful authors, their collaborating links were also considered, along with the number of published documents. The top ten most impactful authors are listed in the table 1. The most significant cluster of authors based on the number of articles and collaborative link strength was found, as shown in the figure 5. It is worth noting that 79 authors (86.81 %) contributed to only one paper in the included studies, i.e., have only one work relating to AI-based automated speech therapy in the last 15 years. Moreover, after analyzing the author’s keywords of the included studies, the most significant cluster of linked and co- occurred keywords was found as shown in the figure 6. The most significant keyword was ASR(Automatic Speech Recognition).
Table 1 Top ten most impactful authors
Author
|
Documents
|
Total Link Strength
|
Lopez-Nores, M.
|
4
|
18
|
Pazos-arias, J.
|
4
|
18
|
Robles-Bykbaev, V.
|
4
|
18
|
Guaman-Heredia, M
|
2
|
11
|
Quasi-peralta, D.
|
2
|
11
|
Lee, T.
|
2
|
10
|
Ng, C.W.Y.
|
2
|
10
|
Ng, S.I.
|
2
|
10
|
Wang, J.
|
2
|
10
|
Garcia-duque, Jorge
|
2
|
9
|
We further report the geographical distribution of the included studies based on the location of the study indicated in the paper (see figure 7). We looked at the author’s affiliation and funding agency when required. Most papers reported on studies which were conducted in Europe (11 papers) and North America (6 papers). Studies conducted in Europe include four studies from Spain and one study each from Germany, Hungary, Romania, Portugal, the Czech Republic, and Italy. On the other hand, studies from North America include four studies from the USA, one collaborative study between Panama and Nicaragua, and another study from Mexico. Moreover, five papers reported studies in Asia, which includes China (2 studies), India (1 study), Taiwan (1 study), and the Philippines (1 study). However, other continents are heavily underrepresented; Africa and Oceanic each have one study conducted. Finally, we could not find any eligible studies meeting our selection criteria which were conducted in South America.
We presented the language distribution of the papers based on the language addressed by the AI-based automated speech therapy tools as reported in the studies (see figure 8). The most addressed languages were English (10 studies) and Spanish (4 studies). Furthermore, two studies addressed the Cantonese language, and there was only one study each for Punjabi, German, Hungarian, Romanian, Portuguese, Italian, Arabic, and Mandarin. The studies were drawn from 23 unique venues. We could observe that the vast majority of venues from which papers were chosen (95.65%) were represented by only one article. Only one venue, i.e., ”Studies in Health Technology and Informatics,” had published two papers included in this review.
3.2 Speech Sound Disorders (RQ1)
We found that researchers have addressed multiple types of SSD in the liter- ature. However, 12 studies out of 24 studies did not address any specific SSD (see figure 9). These studies proposed automated tools for a generalized SSD population and experimented without specifying any particular SSD [8, 12–22]. Researchers have also specifically worked and devised AI-based tools for per- sons with hearing impairment [23, 24]. A novel tongue-based Human-Computer interaction tool [25] and gamified AI-based tool [7] for persons with motor speech disorder have been proposed.
Moreover, Frieg et al. proposed a digital training system for dysarthric patients [26]. In another similar study, Saz et al. devised ASR-based tools and technologies and conducted user studies specifically for dysarthric patients [27]. Singh et al. and Chen et al. developed and assessed automatic AI-based speech therapy tools for articulation disorder in Punjabi and Mandarin, respectively [28,29]. Ballard et al. conducted a feasibility study of a tablet-based automated feedback tool for apraxia patients [30]. On the other hand, Ramamurthy et al. developed a novel companion robot, ”Buddy,” for cleft lip and palate disorder children [31]. In another study, Rivas et al. proposed using a virtual world to provide speech therapy for children with dyslalia [32]. It is worth noting that only one study was related to speech data collection for Cantonese to perform phonology and articulation assessment [13]. The figure 10 shows the distribution of papers addressing specific SSD.
3.3 Level of Autonomy (RQ2)
Researchers worldwide have amplified the debate between autonomy vs. human control due to the risks and concerns associated with AI and large-scale automation [33]. In this area concerning automation and AI in speech ther- apy, we studied the level of autonomy achieved by AI-based automated speech therapy tools. In many studies, researchers build fully automated AI-based speech therapy tools without considering the role of parents, SLPs, and other stakeholders. While Desolda et al. emphasized the role of caregivers and SLP in the design of a remote therapy tool, ”Pronuntia” [12], Ng et al. proposed a fully automated assessment tool using the CUChild 127 speech corpus in Cantonese [14]. In another study, Bilkova et al. developed a novel lip, tongue, and teeth detection system using Convolutional Neural Network (CNN) and Augmented Reality (AR) for supporting the automatic evaluation of speech therapy exercises [25].
Furthermore, Sztaho et al. proposed a fully automated speech therapy tool by displaying visual feedback on intensity(accent), intonation, and rhythm to children with hearing impairments [23]. In another similar study, Hern´andez et al. developed a serious game with an automatic feedback feature for hearing- impaired children [24]. Ballard et al. performed a feasibility study of their tablet-based, fully automated therapy tool for children with apraxia without any role of SLP and other stakeholders [30]. Moreover, V. Robles-Bykbaev et al. proposed a framework imitating the main functionality of SLP along with a robotic assistant motivating children in therapy activity and automatically giving real-time feedback [8, 17–19]. In another similar study, Ramamurthy et al. proposed a companion robot, ”Buddy,” which automatically evaluates speech exercises of children with CL/P disorders with the feature of monitoring by SLPs [31].
3.4 Modes of Intervention (RQ3)
Researchers have adopted different modes of intervention while implementing AI-based automated speech therapy tools for persons with SSD. As these thera- pies are often targeted at children, researchers emphasize developing tools that trigger excitement and build companionship. Desolda et al. proposed a web application for children, SLPs, and caregivers, allowing SLP to assign therapy exercises to children with SSD [12]. The system automatically evaluates the correctness of the exercises and gives real-time feedback. On the other hand,
Ballard et al. proposed a tablet-based therapy tool for children with apraxia [30]. Furthermore, Ng et al. and Sztaho et al. proposed a computer-based prosody teaching system for children with hearing impairment and a computer- based visual feedback system for the hearing impaired, respectively [14, 23]. Bykbaev et al. proposed a novel robotic assistant along with a fully automatic framework imitating the work of SLP [8]. In another similar study, Rama- murthy et al. proposed a therapy robot, ”Buddy,” allowing children to practice assigned exercises at home [31]. Many studies have incorporated serious games as an intervention tool for automatic speech therapy [7, 21, 25, 31, 34]. One of the studies incorporated augmented reality to build a serious game using tongue detection [25].
3.5 Effectiveness (RQ4)
The effectiveness of AI-based automated speech therapy tools depends on their performance compared to SLPs. Moreover, automated speech therapy tools providing wrong feedback can be disastrous to children’s speech improve- ment. Few studies (4 out of 24) compared the results of their automated tool with human experts (SLPs) (see figure 12). Ballard et al. conducted an inter- rater agreement test between their ASR tool and SLPs and found ASR-human agreement averaged 80% [30]. In another study, Sztaho et al. found that their automated tool scores correspond to the subjective evaluation by SLPs [23]. Bykbaev et al. found that over 90% of the therapy plans generated automat- ically by their expert ”Spelta” were ”better than” or ”as good as ” what the SLPs would have created manually [18]. Moreover, in the study by Saz et al., their Automatic Speech Recognition (ASR) and Pronunciation Verifica- tion (PV) modules based on impaired speech utterances provided performance similar to SLPs [27].