Nouns and Verbs Identify Different Subtypes of MCI


 It is well-documented that patients with semantic dementia and Alzheimer’s disease present with difficulty in lexical retrieval and reversal of the concreteness effect in nouns and verbs. Little is known about the lexical phenomena before the onset of symptoms. We anticipate that there are linguistic signs in the speech of people who suffer from mild cognitive impairment (MCI), the prodromal stage of dementia. Here, we report the results of a novel corpus-linguistic approach to the early detection of cognitive impairment. We recorded 40 hours of natural, unconstrained speech of 188 English-speaking Singaporeans; 90 are diagnosed with MCI (51 amnestic, 39 nonamnestic), and 98 are cognitively healthy. The recordings yield 327,470 words, which are tagged for parts of speech. We calculate the per-minute speech rates and concreteness scores of nouns and verbs, and of all tagged words, in our dataset. Our analysis shows that the two measures of nouns and verbs identify different subtypes of MCI. Compared with healthy controls, subjects with amnestic MCI produce fewer but more abstract nouns, whereas subjects with nonamnestic MCI produce fewer but more concrete verbs. Cognitive impairment is manifested in ordinary language before the presentation of clinical symptoms, and can be detected through non-invasive corpus-based analysis of natural speech.

subjects with nonamnestic MCI produce fewer but more concrete verbs. Cognitive impairment is manifested in ordinary language before the presentation of clinical symptoms, and can be detected through non-invasive corpus-based analysis of natural speech.

Main Text
In recent years, there is a growing body of research on the distinct roles that nouns and verbs play in cognitive impairment. It has been well-documented that patients diagnosed with semantic dementia (semantic variant of frontotemporal aphasia) or Alzheimer's disease present with effortful lexical retrieval and reversal of the concreteness effect, producing nouns, and to a lesser extent, verbs, which are more abstract. [1][2][3][4][5][6][7] Most studies collect targeted language data from word-based uency tests on semantic categories (cat, dog) or letters (cat, cake), or from connected speech obtained through structured interviews, picture narrations or fairy tale recalls. The datasets of these studies are relatively small, even those constructed from connected speech, and the speech data are constrained by visual or reading stimuli and by research designs. To our knowledge, there has been no or little study of speech rate and concreteness of nouns and verbs in patients with mild cognitive impairment (MCI), the prodromal stage of dementia. In the present study, we take a novel, corpus linguistic approach to search for linguistic markers of early cognitive impairment in natural, unconstrained speech. Our analysis shows that people with MCI, especially amnestic MCI, experience lexical retrieval di culties and reversal of the concreteness effect in nouns, consistent with the symptoms associated with semantic dementia and Alzheimer's disease.
Our language dataset comes from participants in the Community Health Intergenerational Study, a cohort study of ageing and mental health among Singaporeans 60 years of age or older. 8 Assessments include physical health, socioeconomic conditions, cognitive functioning and unconstrained speech. The neuropsychological battery of tests used to diagnose normal ageing and MCI in the study have been validated in the Singaporean population. 9 The tests evaluate amnestic and non-amnestic cognitive domains of attention, learning, memory, speed, and executive function, which are necessary for the diagnosis of neurocognitive disorders. The battery of ve tests used in the cognitive assessment are 1.
Rey auditory verbal learning test, to evaluate declarative verbal learning and memory; 2. immediate, delayed and recognition memory test and forward and backward digit span tasks, to assess attention and verbal working memory; 3. color trail tests 1 and 2, to assess sustained attention and reasoning; 10 4. block design, to measure visual-spatial and organizational processing abilities and non-verbal problemsolving skills; and 5. semantic verbal uency (animal) test, to tap lexical knowledge and semantic memory organization. 11 The diagnosis for each subject was arrived at through consensus review of the test results by two psychiatrists and one neuropsychologist.
Most Singaporeans are multilingual, speaking English, Chinese, Malay or Tamil. Participants were instructed to talk about any topic for up to 20 minutes in a language they felt most comfortable with, with minimal involvement from interviewers. The speeches were recorded with simple digital voice recorders in an ordinary o ce setting. Topics varied freely and widely, ranging from work and retirement to family life and public affairs. In total, more than 475 participants provided speech samples in English. Among them, 96 were diagnosed with MCI. For our dataset, we selected 90 subjects with MCI who are in their 60s and 70s. We also selected 98 cognitively healthy participants with similar age, gender, education and language pro les. The basic demographic information of the subjects who contributed speech samples to the dataset is shown in Table 1. The recordings were transcribed verbatim by Singaporean students at the National University of Singapore who are familiar with the local languages. The transcribers were not involved in the initial recording sessions. We used the Stanford PoS tagger to tag the transcribed words for parts of speech, based on the Penn Treebank tagset. 12,13 The tagged words were manually vetted by another group of students trained in formal linguistics and in the descriptive grammar of English. The raw recordings contained interviewer-subject interactions. For each recording, we removed the words uttered by the interviewer as prompts, encouragements, and explanations, and adjusted the time of the recording accordingly. We also removed the time for pauses and hesitations during the interactions between the interviewer and the subject, but kept the subject's own pauses, repetitions and false starts during their continuous speech. After the adjustment, the remaining time of a recording is the net talk time of the subject. Table 2 displays the basic statistics of the dataset. As a group, people with MCI talk less, and produce fewer words, than healthy controls. The declines in the four measures are statistically signi cant (p<0.001).
For each subject's speech sample, we calculated the speech rates of nouns, verbs and all tagged words by dividing the total number of noun, verb or word tokens by the total number of minutes. We calculated the concreteness scores of the speech sample of each subject based on the concreteness ratings of 40,000 English words compiled by Brysbaert and colleagues, with 1 being the most abstract and 5 being the most concrete. 14 For each word or lemma, we obtained the score from the database, multiplied it with the number of tokens of the word, to arrive at the word's token score. The sum of all word token scores is then divided by the total number of word tokens to yield the concreteness score of the speech sample of each subject. About 3.4% of common noun lemmas and 0.6% of verb lemmas in our speech samples do not appear in the database and are not rated. Also not rated are proper nouns (Singapore, Singaporean), foreign-origin words (tau 'soybean') and sentence-nal particles unique to Singapore English (lah). These words were not included in the calculation. We also excluded numerals and common nouns used as proper nouns (Elm Street).
The results are shown in Table 3. The results, and the dissociation between nouns and verbs, are consistent with the ndings reported in the linguistic and neuropsychological literatures. In formal linguistics, nouns and verbs are recognized as the big-two major lexical categories, despite the enormous cross-linguistic diversity in morphosyntactic form. [15][16][17] In cognitive neuroscience, there has been extensive evidence, from brain lesion studies to batteries of word-based neuropsychological tests, that nouns and verbs are encoded in different areas in the brain, although the exact neural mechanisms in the lexical representation of grammatical categories remains a matter for debate. [18][19][20][21][22] Patients with semantic dementia and Alzheimer's disease suffer from semantic memory de cits which affect the perceptual attributes of semantic knowledge, resulting in di culty in lexical retrieval and in more abstract speech-the reversal of the concreteness effect. [1][2][3][4][5][6][7] People with MCI, especially amnestic MCI, also present with impairments in semantic memory, as reported in studies drawing data from common word-based neuropsychological tests, such as object and face naming tests, that target semantic memory directly. [23][24][25][26][27] Studies that draw data from connected speech, typically from picture descriptions or story recalls, are not as conclusive. 28-29 To our knowledge there has been no study on whether the semantic memory de cits in people with amnestic MCI lead to de cits in noun retrieval and in reversal of the concreteness effect. The results of our study provide the rst conclusive evidence that semantic memory-related de cits are manifested in ordinary language before the presentation of clinical symptoms of full-edged dementia.
To conclude, we took a novel, corpus linguistic approach to search for linguistic markers of cognitive impairment. The natural speech data obtained from people talking about familiar topics of daily life re ect the mental state of the language more closely and intimately than the data collected through picture narration and story re-telling, or through word-based elicitation. Moreover, corpus data are noninvasive, and easy to collect and analyze. As we have demonstrated, an average of a little more than ten minutes of natural talk yields adequate data that allow us to detect language de cits in people with mild cognitive impairment, the prodromal stage of dementia. The corpus-linguistic method reported here offers a reliable and cost-effective tool of detecting linguistic signs of cognitive decline, helping medical practitioners in the early diagnosis, intervention and management of the progressive disease.

Methods About Singaporean English
Singapore is a small city state of some 5.7 million people, of whom 4 million are citizens and permanent residents, according to the latest census gures released on the government's website (https://www.singstat.gov.sg/). Nearly one million residents are 60 years of age or older, constituting 22% of the population. It is a multilingual country. Since the founding of Singapore as a British crown colony in 1819, most of the early immigrants hailed from southeastern China, southern India, Malaysia and the surrounding Riau Islands of Indonesia. 29 When it gained independence in 1965, Singapore recognized four o cial languages, re ecting the origins of most of its immigrants: Chinese (Mandarin), Malay, Tamil, and English, with Malay having the additional title of national language, and English that of working language. For Chinese Singaporeans, in addition to Mandarin, there are mutually unintelligible dialects, the major ones being Hokkien, Teochew and Cantonese. Since the dialects share a common grammatical and lexical core, 30 we group them together as a single language. At the present time, according to the government's census survey, most Singaporeans are multilingual, with English as the dominant home language for nearly half of the households, and as the common lingua franca. Due to the constant contact with the local languages, the English language in Singapore has undergone extensive lexical and grammatical change, incorporating words (tau 'soy' from Chinese; atas 'arrogant' from Malay) and grammatical features from the local languages (one as a particle for emphasis). 31 Despite the fact that it is the native language of a sizable segment of the population, Singaporean English has not reached the level of register differentiation as American or British English. 32 The Language Samples Our language samples are verbatim transcripts of the recordings of free-owing speech by participants in a cohort study of ageing and mental health among Singaporeans who are 60 years of age or older. The aims and methods of the cohort study have been described elsewhere. 8 Here, we describe how language data are processed. Language sampling is entirely voluntary. The recording took place in a normal o ce setting, with small digital recorders. Participants were told to talk about any topic for up to 20 minutes, in a language that they felt most comfortable with. They were aware that they were being recorded. Interviewer participation was kept to the minimum to allow subjects to plan their speech as free from intervention as is practical. Words uttered by interviewers were removed from the dataset.
We used the Stanford part-of-speech tagger to assign parts of speech to the words in our dataset. The tagger uses the Penn Treebank tagset, shown below: 12,13 Tag Description The same tag TO is used for tagging to as preposition (to school) and as in nitival marker (to go). We separate the two functions, and use TO for the in nitival to only. We also introduced two tags, SFP for sentence-nal particle (Ok lah) and FRG for fragments, which are common in unprepared speech (frfragments).
When tagging Singaporean English materials, the Stanford tagger's success rate is about 85%. Part of the reason for the low success rate is the frequent use of words which are unique to Singaporean English, including foreign words (tau huay 'soy pudding'), and English words that have developed local uses or meanings. Consider one as an example. It is a cardinal number (one school) or a pronominal (last one). These are tagged as one_CD and one_NN, respectively. In addition to these two uses, one is also used in Singaporean English as a sentence-nal particle to express emphasis: My daughter is very active one 'My daughter is very ACTIVE!' The Stanford tagger tags one as a cardinal number here. To ensure accurate part-of-speech assignment, the tagged words were vetted by a separate group of student research assistants who were trained in formal linguistics at the National University of Singapore.
Three sample texts are shown below.
Extract 1 (64, male, cognitively healthy) Transcribed: I came from a a very poor family. Um I grew up in er, you know, in those days where Singapore is a slum. So I've witnessed uh riots. I've witnessed er curfew. And er then I've also experience, I've experienced when er, you know, just sharing one bowl of tau huay for 10 person in the family you know and also to the the extreme is er just er plain rice and then with a sauce and oil and sauce, you know. Sometimes this goes on for weeks ah. why I felt that the secular practice of meditation and in in in this instance we are talking about mindfulness practice, which is really an approach, er a particular approach to meditation, can be helpful to everyone, is because er the teachings have been made secular, with hardly any reference to its religious origin, although we would mention that the approach er is founded on the b the the Buddha's teachings of meditation. The tagged data are processed with Antconc, a common concordance tool used by corpus linguists. 34 Statistical Analysis Two-tailed t-tests on age, year of education, languages spoken and talk time of the subjects who provide speech samples, and on the speech rate and concreteness score data are performed on SPSS v.27.