2.1 Speech Recognition Technology
Automatic Speech Recognition (ASR) is the technology that is translated into text between the computer and the user. ASR is an independent computer-driven transcription of spoken language into readable text in real-time [5]. Previous studies show that the ASR has three models consisting of embedded speech recognition, speech recognition in the cloud, and distributed speech recognition. The advantages and disadvantages of three current ASR models on mobile devices were compared and proposed a fourth model of use, which they called shared speech recognition with user-based adaptation [6]. The authors also said that the challenge of assessing the accuracy of ASR in mobile devices affects several factors. For example, limited available storage space, small cache, low processor, and language.
According to [7] presented a versatile post-processing technique based on the phonetic distance that integrates domain knowledge to improve ASR performance. The authors evaluated Google ASR which differs in domain size and show the result of the gap between the superior acoustic modeling performance of cloud-based speech recognition services and relatively weak open-source acoustic models and the high-performance impact of language restrictions provided by domain knowledge.
According to [8] presented, the HomeServices system uses cloud-based automatic speech recognition. The authors adapted personal devices to the homes of people who require voice-controlled assistive technology. They described the idea of a PAL (a Personal Adaptive Listener) and outlined a methodology called “Virtuous Circle”. This study shows that ASR server recognition technology can be built around an internal team of people who use devices as personal assistants.
According to [9] Google Speech was evaluated which was tested in the English language of two Swedish speakers according to the word error rate (WER) and the translation speed. The author compared this with another speech recognition named 'Pocketsphinx', which tests the same recordings and measurements. The results show that Google Speech displayed 28, 5%, 30,27%, 46,1% and 39,94% lower WER scores than Pocketsphinx.
2.2 Usability Evaluation on Mobile
According to [10] evaluated the performance of two popular cloud-based speech recognition applications, Apple’s Siri and Google Speech Recognition (GSR) under various network conditions. The authors provided two models to evaluate the transcription delay and the accuracy of the transcription of each application under different packet loss and jitter values. They said that delay and accuracy of the voice recognition process is an important parameter that affects the quality of user experience with cloud-based speech recognition applications. The result shows that the performance of cloud-based speech recognition systems can be affected by jitter and packet loss, which occur frequently over Wi-Fi and cellular network connections.
According to [11] presented a heuristic evaluation checklist on the mobile interface, which is experimentally evaluated as a design tool. The result shows that mobile usability has different kinds of devices, contexts, tasks, and users. The author divided three types of mobile interfaces. The first type is a feature phone. The authors defined that “they are basic handsets with tiny screens and very limited keypads that are suitable mainly for dialing phone numbers.” The second type is smartphones. The authors defined “phones with midsize screens and full A-Z keypads.” The third type is touch phone/touch tablets. The authors defined “devices with touch-sensitive screens that cover almost the entire front of the phone.” This study shows that the new checklist is useful for avoiding usability gaps in mobile interfaces even with new developers.
According to [12] presented generic heuristics and performed a new theory based on Jakob Nielsen's heuristics. This study derived the initial usability heuristics from Nielsen's heuristics. Generic usability heuristics using a conceptual-theoretical approach suitable for the evaluation of MMA because the set of heuristics by Nielsen is not suitable for MMAs. This study was implemented in four phases. First, they explored the generic heuristics of Nielsen's heuristics, especially evaluating the usability of MMAs. Second, they familiarized themselves with a domain-specific heuristic. Third, they formulated the new heuristics.