Irregular word reading as a marker of cognitive and semantic decline in Alzheimer’s disease rather than an estimate of premorbid intellectual abilities.

Abstract Background Irregular word reading has been used to estimate premorbid intelligence in Alzheimer’s disease (AD) dementia. However, reading models highlight the core influence of semantic abilities on irregular word reading, which shows early decline in AD. The general aim of this study is to determine whether irregular word reading is a valid estimate of premorbid intelligence, or a marker of cognitive and semantic decline in AD. Method 681 healthy controls (HC), 104 subjective cognitive decline, 290 early and 589 late mild cognitive impairment (EMCI, LMCI) and 348 AD participants from the Alzheimer’s Disease Neuroimaging Initiative were included. Irregular word reading was assessed with the American National Adult Reading Test (AmNART). Multiple linear regressions were conducted predicting AmNART score using diagnostic category, general cognitive impairment and semantic tests. A generalized logistic mixed-effects model predicted correct reading using extracted psycholinguistic characteristics of each AmNART words. Deformation-based morphometry was used to assess the relationship between AmNART scores and voxel-wise brain volumes, as well as with the volume of a region of interest placed in the left anterior temporal lobe (ATL). Results EMCI, LMCI and AD patients made significantly more errors in reading irregular words compared to HC, and AD patients made more errors than all other groups. Across the AD continuum, as well as within each diagnostic group, irregular word reading was significantly correlated to measures of general cognitive impairment / dementia severity. Neuropsychological tests of lexicosemantics were moderately correlated to irregular word reading whilst executive functioning and episodic memory were respectively weakly and not correlated. Age of acquisition, a primarily semantic variable, had a strong effect on irregular word reading accuracy whilst none of the phonological variables significantly contributed. Neuroimaging analyses pointed to bilateral hippocampal and left ATL volume loss as the main contributors to decreased irregular word reading performances. Conclusions Irregular word reading performances decline throughout the AD continuum, and therefore, premorbid intelligence estimates based on the AmNART should not be considered accurate in MCI or AD. Results are consistent with the theory of irregular word reading impairments as an indicator of disease severity and semantic decline.

consistent with the theory of irregular word reading impairments as an indicator of disease severity and semantic decline.

Background
Alzheimer's disease (AD) is the most common cause of dementia (Alzheimer's Association, 2023).It is characterized by the insidious accumulation of beta-amyloid and tau proteins ensuing damage to neurons and accompanied with progressive cognitive and behavioral changes.The AD continuum is characterized by three phases, (a) preclinical, (b) mild cognitive impairment (MCI) and (c) AD dementia.One condition which has received increasing attention as an indicator of preclinical AD is subjective cognitive decline (SCD), described as the perception (by oneself or a close contact) of worsening of one's mental abilities, despite seemingly unimpaired performance on objective tests (Jessen et al., 2014).SCD has been associated with increased risks of future objective cognitive decline (Mitchell et al., 2014), as well as increased likelihood of biomarker abnormalities consistent with AD pathology (Visser et al., 2009).In the intermediate stage between SCD and AD dementia, MCI patients present objective impairment in one or more cognitive domains (Albert et al., 2011), but their cognitive changes are mild enough that they require minimal aid or assistance, retaining independence of function in their daily life.On the other hand, AD dementia is associated with more signi cant cognitive impairments in at least two cognitive domains, which in this case interferes with independence and activities of daily living (Mckhann et al., 2011).The typical amnestic AD dementia most prominently affects learning of new information (episodic memory), but de cits can also be observed in language, visuospatial or executive functions, and through behavioral abnormalities or personality changes.Measuring cognitive decline is therefore central in assessing individuals on the AD continuum.To establish cognitive decline, clinicians will often rely on self and relative-reported changes, as well as comparisons to demographically-adjusted norms of cognitive performance in healthy individuals.Another method is to compare current abilities to an estimate of one's baseline abilities before they were affected by the disease, often referred to as premorbid abilities.
Historically and across many countries, one of the ways to estimate premorbid abilities in patients is the administration of irregular word reading tests (Nelson, 1982;Blair and Spreen, 1989;Schmand et al., 1991;Grober and Sliwinski, 1991;Beardsall and Huppert, 1994 Yi et al., 2017).It relies on the assumptions that (a) reading abilities reached by a normal adult is related to their general intelligence and (b) once reading becomes a highly practiced and overlearned skill, it can be maintained at a high level despite deteriorations in other areas of intellectual functioning (Nelson and McKenna, 1975).In 1978, Nelson and O'Connell introduced the rst irregular word reading test, the New (later changed to "National") Adult Reading Test (NART).The logic behind the use of irregular word reading, as opposed to regular word reading in estimating premorbid intelligence, is that irregular word reading relies on familiarity to speci c words with exceptional spelling.For example, "pint" can only be read correctly by a person who know of the word and recognises it.Its pronunciation indeed cannot be guessed through the application of common rules of grapheme-phoneme correspondence, as that would only result in reading it like "mint".Therefore, the accurate reading of less frequent irregular words would indicate a larger premorbid vocabulary, which would be related to a high premorbid intellectual quotient (IQ).This assumption was veri ed on many occasions in healthy adults, most recently when the NART was standardized against the Weschler Adult Intelligence Scale IV, both tests correlating with a correlation coe cient of r = .69(Bright et al., 2018).
While this assumption was mainly tested in healthy adults, the situation might be different in neurodegenerative disorders.In 1996, Taylor and colleagues pointed out that if estimates of premorbid IQ in patients with neurodegenerative disorders are to be considered valid and accurate, they should (a) not signi cantly change as disease progresses in severity and (b) not differ signi cantly from those of demographically matched control subjects (i.e., cognitively unimpaired older adults).These assumptions are consistent with the fact that crystallized intelligence remains relatively stable across the lifespan.In line with these criteria, many cross-sectional studies support the use of NART-like tests in estimating premorbid IQ of AD patients (Nebes et al., 1984;O'Carroll and Gilleard, 1986;Cummings et al., 1986;Hart et al., 1986;Crawford et al., 1988;Schlosser and Ivison, 1989;Sharpe and O'Carroll, 1991;Raymer and Berndt, 1996;Johnstone et al., 1996;Maddrey et al., 1996;Smith, 1997;Paolo et al., 1997;Law and O'Carroll, 1998;Bright et al., 2002;Luzzatti et al., 2003;McGurn et al., 2004;Matsuoka et al., 2006;Alves et al., 2013;Yi et al., 2017).According to those studies, their respective NART-like tests are (at least su ciently) dementia-insensitive to provide a useful measure of premorbid IQ.This assumption results from the fact that they have not observed signi cant differences on NART-like test scores between demographically-matched healthy controls (HC) and AD participants, despite those groups differing signi cantly on regular IQ tests.This absence of group differences would indicate, at least in theory, that the NART-like score of those AD participants has not declined signi cantly since before they developed neurodegenerative symptoms, that the difference between the NART estimated IQ and the current IQ would serve to quantify the AD patient's true cognitive decline.In addition to this cross-sectional support, one longitudinal study found that irregular word reading scores remained stable over a period of 2 years in mild AD patients (Rolstad et al., 2008).In con ict with these conclusions however, many crosssectional studies have observed signi cant differences on NART-like test scores between demographically matched HC and AD participants, thus giving support to the theory that irregular word reading might be affected in AD dementia and that this widely used test does not give an accurate estimate of premorbid intelligence in this population (Rapcsak et al., 1989;Patterson et al., 1994;O'Carroll et al.,1995;Storandt et al., 1995;Hughes et al., 1997;Conway et O'Carroll, 1997;Glosser et al., 1999;Taylor, 2000;Pestell et al., 2000;Weekes, 2000;Colombo et al., 2000;Noble et al, 2000;Colombo et al., 2004;McFarlane et al., 2006).Even more importantly, several longitudinal studies also offer support to this theory in that they observed a signi cant decline in NART-like performance in AD participants over time (Fromm et al., 1991;Paque and Warrington, 1995;Taylor et al., 1996;Strain et al., 1998;Cockburn et al., 2000;Pavlik et al., 2006;Grober et al., 2008;Lowe Rogers, 2011;Weinborn et al.,2018).
These con icting lines of research could be the result of different factors.A rst problem with many of the aforementioned studies is that they have been conducted in the 90s and early 2000s, when concepts like SCD and MCI didn't exist.The same can be said for biomarkers and AD dementia criteria which were not as well developed at the time (McKhann et al., 2011).In those older studies, it is possible that SCD or MCI participants were classi ed as normal controls or that other types of dementias were diagnosed as AD dementia.Of note, SCD patients are absent from all the aforementioned studies whilst only three included MCI participants (that is, Alves et al., 2013;Yi et al., 2017;Weinborn et al., 2018).Previous studies were also conducted using a relatively low sample size, most often with less than 50 AD participants.This brings particular concern towards the studies in support of the accuracy of irregular word reading premorbid IQ estimates in AD dementia, because in some, nonsigni cant differences suggested that a larger sample size would reveal statistical and clinical signi cance in control-AD comparisons (Nebes et al., 1984;Maddrey et al., 1996;Paolo et al., 1997).Nonetheless, when focusing on studies with larger samples sizes and/or longitudinal studies (vs.cross-sectional studies) the evidence seems against the use of irregular word reading as a marker of premorbid IQ in AD dementia.It is also notable that even in studies supporting their use, NART-like tests were often found to only be accurate at certain, earlier stages of the AD continuum, whilst becoming inaccurate in more severe stages.The stage at which inaccuracies appear varies from study to study, ranging from MCI to moderately severe AD.
Alternatively to the theory that irregular word reading is a measure of premorbid intelligence in AD dementia, some studies suggest that its impairment might re ect a semantic decline (Rapcsak et al., 1989;Fromm et al., 1991;Patterson et al., 1994;O'Carroll et al.,1995;Storandt et al., 1995;Taylor et al., 1996;Strain et al., 1998;Glosser et al., 1999;Cockburn et al., 2000;Brambati et al., 2009).This hypothesis is in line with models of reading that consider the core in uence of semantic processes on irregular word reading (Seidenberg et McClelland, 1989;Patterson et Hodges, 1992;Plaut et al., 1996;Coltheart, 2006;Woollams et al., 2007, Taylor et al., 2013;Taylor et al., 2015;Chapleau et al., 2017).Consistent with this idea, AD performances on reading and writing tasks that rely to a lesser extent on semantic processing (e.g., reading or writing of words with regular grapheme-phoneme mappings) appear to be qualitatively more similar to, than divergent from, normal performances, in contrast with tasks requiring semantic processing such as exception word reading (Glosser et al., 1999).This is further supported by a cooccurring and proportionally similar decline in semantic performances (as measured for instance by picture naming performance) and irregular word reading (Strain et al., 1998).Thus, it would appear that a core semantic memory de cit may be the underlying mechanism to impaired irregular word reading in AD dementia, in line with a large body of work suggesting that semantic memory impairments are an early and predominant symptom in MCI and AD dementia (Predovan et Chapleau et al., 2016).Nonetheless, the hypothesis of a semantic de cit causing irregular word reading de cits in the AD continuum remains debated and more evidence is needed to draw solid conclusions regarding the underlying cognitive and neural mechanisms of irregular-word reading in these patients.
The aim of the present article is to assess over a large, well-characterized sample representative of the AD continuum, whether irregular word reading performances (a) signi cantly differ between diagnostic categories across this continuum and (b) are linked to general cognitive impairment / dementia severity.We hypothesize (1) that demographically-matched MCI and AD participants will perform signi cantly worse than controls on irregular word reading and (2) that irregular word reading will be correlated with general cognitive impairment / dementia severity.If these two hypotheses are supported by our results and that irregular-word reading performance is not maintained at different stages of AD, we will investigate three additional aims, namely whether the performance on irregular word reading is linked (c) to semantic neuropsychological tests; (d) to psycholinguistic variables associated with semantic processes (but not with psycholinguistic variables associated with phonological processes) and (e) to brain volumes in regions associated with semantic processing.These analyses will contribute to clarify the underlying cognitive and neural mechanisms of irregular word reading de cits.We hypothesize that (3) we will observe a stronger correlation between irregular word reading and tests of semantic processes (e.g.picture naming), as opposed to other tests (e.g., executive functions or episodic memory); (4) the accuracy of single items of the irregular words reading test will be associated with the lexicosemantic variables of the words (e.g., number of sense, semantic neighborhood, concreteness or age of acquisition) as opposed to phonological variables (e.g., number of phonemes, syllables or phonological neighborhood) and ( 5) nally, we should nd neural correlates of semantics to be related to irregular word reading performance, namely the left ATL.

Methods
The data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu).ADNI began in 2004 as a public-private partnership under the leadership of Dr. Michael W. Weiner.The primary goal of ADNI has been to detect AD dementia at the earliest possible stage (pre-dementia) and identify ways to track the disease progression.To that end, data from magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers as well as clinical and neuropsychological assessments have been collected to test if they can be combined to measure the progression of the various stages of the AD continuum.The initial ve-year study (ADNI-1) was extended by two years in 2009 by a Grand Opportunities grant (ADNI-GO), and in 2011 and 2016 by further competitive renewals of the ADNI-1 grant (ADNI-2, and ADNI-3, respectively).For up-to-date information, see www.adni-info.org.

Participants
Participants over all ADNI studies (1, GO, 2 and 3) who had American National Adult Reading Test (AmNART) scores available at their baseline assessment were included in this study.All participants, aged between 54 and 91 years (inclusive), had completed a minimum of six years of education and did not have vascular dementia, depression, sensory disturbances, or other medical conditions that could interfere with the study.A study-partner who had frequent contact with the participant (an average of 10h per week or more) also accompanied them to visits and lled out questionnaires.
The HC status was reserved for participants free of memory complaints, veri ed by a study partner, beyond what one would expect for age, as well as normal memory function documented by scoring above education adjusted cutoffs on the Logical Memory II subscale (LM II) delayed paragraph recall, from the Wechsler Memory Scaled -Revised (WMS-R).Scoring (a) ≥ 9 for 16 or more years of education; (b) ≥ 5 for 8-15 years of education; and (c) ≥ 3 for 0-7 years of education.Additionally, Mini-Mental State Examination (MMSE) score between 24 and 30 (inclusive), Clinical Dementia Rating (CDR) = 0, and without signi cant impairment in activities of daily living.There was no criterion regarding memory complaints.
Participants classi ed as SCD presented the same scores as HC participants on the WMS-R LM II, MMSE, CDR and presented no signi cant impairment in activities of daily living.Unlike their HC counterpart, SCD participants presented signi cant subjective memory concern as reported by subject, study partner, or clinician, as well as signi cant memory concern con rmed by Cognitive Change Index score ≥ 16.
Participants were classi ed as EMCI if they presented subjective memory concerns as reported by the subject, their study-partner or clinician, had abnormal memory function documented by scoring within the education adjusted ranges on the WMS-R LM II, scoring inclusively (a) 9-11 for 16 or more years of education; (b) 5-9 for 8-15 years of education; and (c) 3-6 for 0-7 years of education, an MMSE score between 24 and 30 (inclusive) and a CDR score = 0.5.Their general cognition and functional performance were su ciently preserved so that a diagnosis of AD could not be made.
Participants were classi ed as LMCI if they presented subjective memory concerns as reported by the subject, their study-partner or clinician, had abnormal memory function documented by scoring within the education adjusted ranges on the WMS-R LM II, scoring (a) ≤ 8 for 16 or more years of education; (b) ≤ 4 for 8-15 years of education; and (c) ≤ 2 for 0-7 years of education, an MMSE score between 24 and 30 (inclusive) and a CDR score = 0.5.Their general cognition and functional performance were su ciently preserved so that a diagnosis of AD could not be made.In addition to ADNI general inclusion and group classi cation criteria, we applied for this study two additional speci c criteria.The rst one was to be native English speakers (excluded N = 33).The second criterion was consistency between total AmNART scores and single item-level data on this test, when available, in the ADNI database (excluded N = 52).Of the original 2097 and after all considerations, 2012 participants remained, of which 681 HC, 104 SCD, 290 EMCI, 589 LMCI and 348 AD.Demographics of this nal sample are provided in the result section.

Diagnosis of AD was made in participants with
2.2 Procedure

Cognitive assessments
AmNART.To measure irregular word reading abilities in an American population, the AmNART (sometimes called ANART) was used.This test is an adaptation of the original British NART (Nelson, 1982) developed speci cally for the American English population to estimate premorbid intelligence through irregular word reading (Grober and Sliwinski, 1991).The version used by ADNI comprises a list of 50 irregular words, with about half of them identical to the NART.These words are irregular words, also known as exception words, meaning that their actual pronunciation differs from what would be predicted based on the application of grapheme-to-phoneme mapping (e.g., pint, cellist).They are intended to be printed in order of increasing di culty and are relatively short to avoid the possible adverse effect of stimulus complexity.Given no time limit, the subject is instructed to read aloud down the list of words, errors made in pronouncing each word is then recorded into an "error score".Participants are allowed to self-correct but are not prompted to do so unless it was di cult to hear what was said and it is necessary to determine whether the pronunciation was correct or incorrect.If they hesitate on two different pronunciations, one correct and the other incorrect, they will be asked which one they think is best.
To assess the involvement of psycholinguistic variables on successful reading of irregular words, we extracted characteristics for each of the 50 AmNART irregular words using the English lexicon project (ELP; Balota et al., 2007) as well as the WordNet (Miller, 1995)  Trail making Part-B.To measure executive functioning, the trail making test was used.More speci cally, we used scores obtained on part-B of the test, which depends on visuomotor, perceptual-scanning skills and requires considerable cognitive exibility in shifting from number to letter sets under time pressure (Partington and Leiter, 1949).25 circles are presented to the participant which contains numbers 1 through 13 and letters A through L, the circles are scrambled across the given medium, the participant must connect the circles while alternating between numbers and letters in ascending order (e.g., A to 1; 1 to B; B to 2; 2 to C), they have up to 300 seconds to complete the test, their time to complete it (in seconds) is recorded as their score.
Rey Auditory Verbal Learning Test (30-minute delay).To measure episodic memory, we used the Rey Auditory Verbal Learning Test (AVLT; Rey, 1964).Over ve learning trials, participants are read a list of 15 words (list A), they are asked to recall them immediately with no regards for order.After the fth learning trial, the same task is done using an interfering list (B).Immediately and 30 minutes after administration of list B, list A is recalled, this time without rst being read.Scores from the 30-minute delay test were used as our measure of episodic memory.

Neuroimaging
All participants received T1-weighted (T1w) MRIs (see http://adni.loni.usc.edu/methods/mri-tool/mrianalysis/ for the detailed MRI acquisition protocols).T1w scans for each participant were pre-processed through our standard pipeline including denoising ( was performed to measure the local anatomical differences in the brains of the participants by estimating the Jacobian determinant of the inverse of the estimated nonlinear deformation eld as a proxy of atrophy (Dadar et al., 2020).DBM values re ect the relative volume of the voxel with respect to the template; i.e. a value of 1 indicates similar volume to the same region in the template, values lower than one indicate volumes smaller than the corresponding region in the template, while values higher than one indicate volumes that are larger than the corresponding region in the template.Therefore, lower DBM values can be interpreted as reduction in the structure volume, i.e., regional atrophy.Voxel-wise DBM maps were used to assess the relationship between brain atrophy and AmNART scores at a voxel level.In addition, mean DBM values within a region of interest (ROI) including the left anterior temporal lobe were used to assess the relationship between atrophy in the left anterior temporal lobe and AmNART scores.

Behavioral analyses
To describe the sample, Pearson's chi-square test was used to assess sex differences.One-way analysis of variance (ANOVA) and Tukey post-hoc testing were used for all other variables.
To test the hypotheses that AmNART scores are dementia insensitive and semantic-related we modeled a number of multiple linear regression that predicts AmNART total error score based on (1) diagnostic category, extracting an ANOVA table to test for the factor as a whole, (2) tests of severity (MMSE, MoCA) and (3) neuropsychological tests (BNT, TRAIL-B, AVLT), controlling for sex, age, and education, as well as severity as measured by the MMSE in when assessing relation to neuropsychological tests.The MMSE was favored to control for severity as results on the MoCA were not available for the whole sample.
Epsilon square was used as a measure for effect size (ε 2 ; Okada, 2013).Stein's formula was used to calculate adjusted R 2 (Stevens, 2002).
To assess the involvement of each psycholinguistic variable extracted from the AmNART (as presented in 2.2.1) on irregular word reading, we analyzed single-item accuracy with a generalized logistic mixedeffects model using the lme4 package (Bates et al., 2015).This analysis was conducted on a subsample of participants who had single item-level AmNART data available, as opposed to only having total AmNART score available (195 HC, 323 LMCI, 156 AD

Neuroimaging analyses
Similar linear regression models were used to assess the relationship between AmNART scores and voxelwise DBM values in the subset of the participants that had MRI information available (N = 1863), controlling for age, sex, and level of education.A second set of models were also run with diagnostic category as an additional covariate.Voxel-wise results were corrected for multiple comparisons using False Discovery Rate (FDR) controlling technique, with a signi cance threshold of 0.05.
Second, we conducted a ROI-based analysis to test the speci c hypothesis of a relationship between the volume in the left ATL and irregular word reading on a subsample of participants who had neuroimaging data available (N = 1863).To do so, we modeled a multiple linear regression that predicts AmNART total error score based on the DBM in the left ATL, controlling for sex, age, education, with and without including diagnostic category as a covariate in the models, similar to the voxel level analyses.The ATL ROI was selected from a previous study (Borghesani et    Neuropsychological and language evaluations broadly revealed the expected patterns of impairment across the AD continuum.First, measures of severity worsened along the continuum of disease progression stages.Second, episodic memory de cits are predominant, but cognitive decline gradually extends to other cognitive domains.

Irregular word reading across the AD continuum
When controlling for sex, age, education, AmNART total error score signi cantly differed between diagnoses (F [4,2004] = 52.20 p < .001,partial ε 2 = .09,Fig. 1).Overall, patient groups with more advanced disease progression on the AD continuum made more errors on irregular word reading.Speci cally, as seen in Fig. 1, AD dementia participants showed signi cantly lower performance compared to all other groups.In addition, HC scores also differed signi cantly from that of EMCI and LMCI.Means and standard deviations of AmNART total scores as well as signi cant differences are presented in Table 2 (more detailed T ratios, p values and effect sizes for each contrast are presented in supplementary table 1).
Of note, we observed the presence of 18 outlier participants (AmNART total error score deviated by ± 3.29 z-scores in comparison to the average and standard-deviation of their respective diagnostic group), more precisely 12 HC, 3 EMCI and 3 LMCI.However, excluding these participants did not impact any of the results of the analyses.

Association between irregular word reading and general cognitive impairment / severity
Whole sample and group-speci c partial correlations between AmNART and measures of disease severity/global cognition (MoCA and MMSE) are presented in Fig. 2. Both measures of severity were signi cantly correlated with total AmNART scores, in all diagnostic groups as well as across the whole sample, further supporting a strong link between AD disease progression and impaired irregular word reading.

Association between irregular word reading and lexicosemantic, executive functioning and episodic memory performances
Whole-sample and group-speci c partial correlations between AmNART and the chosen neuropsychological tests (BNT, Trail making part-B and AVLT delayed recall) are presented in Fig. 3. Total AmNART irregular word reading scores were signi cantly and moderately correlated with BNT scores (measuring picture naming or lexicosemantic abilities), weakly but signi cantly correlated with the Trail making part-B (measuring executive functioning), and poorly correlated with the AVLT delayed recall (measuring episodic memory), being only signi cant in the EMCI group (p < .001)and across the whole sample (p < .05).
The model created to distinguish between the involvement of lexicosemantic, executive and memory functions in irregular word reading is presented in Table 2. Consistently with the correlational analyses, we observed that the BNT provides a strong contribution to the model (standardized β = -0.31,p < .001), the trail making provides a weak but signi cant contribution (standardized β = -0.06,p < .001)and the AVLT delayed recall does not provide a signi cant contribution (p = .887).

Association between irregular words and psycholinguistic variables (lexicosemantic and phonological)
To better understand the relationships between AmNART irregular words and correct reading, we rst selected a subsample for whom single item-level AmNART data was available (195 HC, 323 LMCI, 156 AD).The model used to predict irregular word item success based on their psycholinguistic variables is presented in Table 3.While none of the phonological variables had a signi cant effect on irregular word reading accuracy, there was a signi cant effect of age of acquisition (β = -0.42,z = -5.62).

Link between irregular word reading and brain volumes
Figure 4 shows the results of the signi cant associations between voxel-wise DBM maps and AmNART scores, including age, sex, and education level as covariates, after correction for multiple comparisons (FDR).At a voxel-wise whole brain level, we observed signi cant correlations with bilateral medial temporal lobe regions, including the hippocampi, as well as with the ATL, the inferior and middle temporal gyrus, and the fusiform gyrus, predominantly in the left hemisphere.However, no voxels survived FDR correction after including the diagnostic group as covariate.At the ROI level, ATL DBM values were signi cantly associated with AmNART scores when including age, sex, and education as covariates (standardized β = -0.11,p < .001).Furthermore, this association remained signi cant after including diagnostic group as an additional covariate (standardized β = -0.05,p < .05).The model used to predict AmNART error score based on brain volumes in the ATL is presented in Table 4.
Relation between voxel-wise DBM maps and AmNART error score

Discussion
The present study aimed to assess, over a large and well-characterized sample of participants on the AD continuum, whether irregular word reading performance is an accurate indicator of premorbid intelligence, or a marker of general cognitive and semantic de cits.Results showed that EMCI, LMCI and AD patients make signi cantly more errors in reading irregular words compared to HC, and that AD patients also make signi cantly more errors than all other groups.Across the whole AD continuum, as well as within each diagnostic group, irregular word reading abilities were further signi cantly correlated to measures of general cognitive impairment / dementia severity.This suggests that irregular word reading performances decline throughout the AD continuum, and that even at a ner grain beyond diagnostic categories, a strong link exists between dementia severity and irregular word reading di culties.Premorbid IQ estimates based on this measure should therefore not be considered accurate in patients with MCI or AD.Furthermore, results indicated signi cant moderate association between irregular word reading and neuropsychological tests of lexicosemantics, as opposed to weak association to executive function and no association to episodic memory.At the item-level, none of the phonological variables had signi cant effect on irregular word reading accuracy whilst age of acquisition, a semantic variable, provided a signi cant contribution.Finally, the whole-brain neuroimaging analysis pointed to the hippocampal and left ATL volume loss as the main contributors to decreased irregular word reading performances.These results are consistent with the theory of irregular word reading impairments as an indicator of disease severity and semantic decline, as opposed to an indicator of premorbid IQ, and pave the way for further investigation on the matter.underestimate cognitive changes in people with memory complaints, be more likely to underdiagnose AD continuum conditions or underestimate disease progression in those already diagnosed with one of these conditions.Therefore, it seems preferable for clinicians to rely on comparisons to demographicallyadjusted norms of cognitive performance to establish cognitive decline, as well as on repeated measures over time.
Consistent with hypothesis 3, lexicosemantic abilities were the second-best predictor of irregular word reading performances, just after education but largely above dementia severity and other cognitive functions (executive functions and episodic memory).The importance of lexicosemantic abilities in irregular-word reading was further in line with hypothesis 4, as AmNART item success rate was signi cantly predicted by the age of acquisition of irregular words, which has been associated with semantic representations (Juhasz 2005;Elsherif et al., 2023).This is consistent with the fact that NARTlike tests are intended to bypass phonemic decoding by relying more heavily on a person's knowledge of exceptional spelling associated with irregular words.Overall, this set of results highlights the strong association between irregular word reading and semantic abilities, as suggested by Strain and colleagues in 1998 and consistent with the idea of semantic abilities' core in uence on irregular word reading performances, particularly but not limited to the AD continuum population.These results are consistent with models of reading that would consider the core in uence of semantic abilities on correct reading aloud of irregular words, as emphasized by Taylor and colleagues in their 2015 review.Although not all semantic psycholinguistic variables signi cantly predicted correct reading, the signi cant involvement of age of acquisition is consistent with the idea that words acquired at a younger age and used more frequently within the population might be more strongly stored in semantic memory, enhancing the likelihood of successful reading.What these results also show is that executive functioning, episodic memory and phonology do not seem to be as crucial in irregular word reading performance.
Beyond the implication of lexicosemantic abilities as an underlying cognitive mechanism of irregular word reading, we were also interested in looking at the underlying neural mechanisms.Results of our neuroimaging analyses were in line with hypothesis 5.The whole-brain analysis suggested that bilateral hippocampi volumes, as well as with the ATL, the inferior and middle temporal gyrus, and the fusiform gyrus, predominantly in the left hemisphere, were strongly related to AmNART performance, The signi cant correlation with the ATL, even when controlling for diagnostic groups, was further con rmed in the ROI-based analysis.Atrophy of the hippocampus is a marker of AD disease severity.The hippocampus is known to be one of the key brain structures affected in AD dementia (Rao et  While the current study ful lls many gaps in the literature (large sample size, well-characterized participants at four different stages on the AD continuum, investigation of underlying cognitive and neural mechanisms of irregular word reading), these results also need to be considered within the context of several limitations.Firstly, the cross-sectional design of the study does not con rm that irregular word reading declines with time in participants on the AD continuum, as a longitudinal design would.Secondly, the ADNI cohort is not population-based and underrepresents ethnoculturally diverse populations, its participants are also highly educated and have fewer comorbidities compared to other cohorts (Birkenbihl et al., 2020).ADNI results must be interpreted with the caveat that they may have limited external validity for more diverse populations.Generalizing to other populations is further complicated by differences in how irregular words are experienced in other languages with more transparent spelling-to-sound correspondences.Italian for example, is more transparent than the more opaque English and is characterized by regular spelling to sound correspondence (Colombo et al., 2000.The same can be said for languages that incorporate phonograms or ideograms (e.g., Chinese, Japanese, Korean or Vietnamese) which could be more context-dependent or invoke greater imageability in reading.Third, item-level analyses were conducted on a subsample of participants for which item-level AmNART data (as opposed to only total AmNART score) was available (195 HC, 323 LMCI, 156 AD).Additionally, 20 words with missing values in age of acquisition, objective lexical frequency, concreteness, and/or phonological neighborhood had to be excluded from item-level analyses.Fourth, the use of the BNT as unique semantic test has certain limits, as picture naming involves distinct cognitive processes that are not limited to semantics.Involved are visual analysis of the picture, recognition of the stimulus as familiar, activation of the semantic representation of the object via the semantic system, a lexicalsemantic process which directs selection and retrieval of semantic information in a task appropriate way, modality-independent lexical access to the phonological word form of the object, that is to say the speech sounds used in the word; and the motor programming and articulation required for saying the word (DeLeon et al., 2007; Harry and Crowe, 2014).This is important as reading models see the involvement of both lexical and semantic processing in correct reading of irregular words (Taylor et al., 2013), these results should therefore not be interpreted as an involvement of semantics alone.

Conclusions
Measuring cognitive decline can be particularly challenging for clinicians when considering that diseases as insidious as AD dementia may be involved.Cognitive decline will more often than not have to be estimated post-hoc, blind to an individual's objective baseline performances.The rst assessment, where only one time point is available, could prove critical to any intervention against the disease and its progression.Currently, clinicians have to rely on subjective complaints, demographically-adjusted norms of cognitive performance and repeated measures.The results of this study lend support to the idea that irregular word reading tests do not provide an accurate estimate of premorbid IQ in the MCI-AD populations as it appears irregular word reading performances signi cantly declines in this population and are related to semantic impairments correlated to hippocampal and ATL volume loss.Relying on these estimates could lead clinicians to underestimate cognitive decline in people with those conditions.Premorbid estimates should rely on more crystalized forms of intelligence that are uncorrelated to disease severity as evidenced by longitudinal studies in clinically diverse populations.Relation between AmNART error score and diagnostic category

List Of Abbreviations
a memory complaint con rmed by a study partner (or reported only by the study-partner), with abnormal memory function documented by scoring within the education adjusted ranges on the WMS-R LM II, scoring (a) ≤ 8 for 16 or more years of education; (b) ≤ 4 for 8-15 years of education; and (c) ≤ 2 for 0-7 years of education, an MMSE score between 20 and 26 (inclusive), with a CDR score = 0.5 or 1, and who met the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association criteria for probable AD.

Figure 2 A
Figure 2

Figure 3 A
Figure 3 ; Del Ser et al., 1997; Vaskinn et Sundet, 2001; Ginsberg et al., 2003; Mackinnon and Mulligan 2005; Matsuoka et al., 2006; Tallberg et al., 2006; Rolstad et al., 2008; Starkey and Halliday, 2011; Alves et al., 2012; (Shaoul and Westbury, 2010)uency was obtained from ELP and is the log10 of number of times the word appears in the corpus + 1.The measure of orthographic neighborhood density was the orthographic Levenshtein distance to the 20 closest neighbors in the lexicon (OLD20,Yarkoni et al., 2008).To put it simply, it is a measure of similarity and proximity to other words of the lexicon.Speci cally, the OLD20 of a given word is computed as the mean of string edit distances from this word to its 20 closest orthographic neighbors in the lexicon.The edit distance used, Levenshtein distance (LD), corresponds to the number of operations (letter deletion, insertion, or substitution) needed to change a word into another word: for example, the LD from smile to similes is 2 (two insertions: I and S).Next is The measure of age of acquisition was obtained from ELP, originally recorded by Kuperman and colleagues (2012) as the estimated age at which a word was learned, which has been shown to have larger effects in tasks involving semantic information (e.g., picture naming and lexical decision) as opposed to tasks where semantic information was less involved (e.g., reading aloud; Juhasz 2005; Elsherif et al., 2023).The measure of concreteness was obtained from ELP and is described by Brysbaert and colleagues (2014) as evaluating the degree to which the concept denoted by a word refers to a perceptible, relatable, entity.The measure for number of senses was obtained from WordNet and is described byMiller (1995)as the number contexts in which the word can be used to express the number of possible meanings it has.The measure for semantic neighborhood density was obtained from ELP and is described byMirman and Magnuson (2008)as the number and/or proximity of neighboring representations, density referring to how tightly packed the words in the neighborhood are(Shaoul and Westbury, 2010).For phonological variables, we used (a) the number of syllables; (b) the number of phonemes (c) and phonological neighborhood.The measure of phonological neighborhood was obtained from ELP and is, similarly to the aforementioned OLD20, a measure of 20 phonological LD (PLD20).The test presents objects in order of frequency, from most to least common and is discontinued after 6 consecutive failures.ADNI only administers odd numbered items on the standard 60 item BNT, this gives us a maximum score of 30.
data sets, prioritizing ELP, but using WordNet when data was not otherwise available.As control variables, we used (a) word length (number of letters), (b) objective lexical frequency, (c) orthographic neighborhood density and (d) summed bigram frequencies by position.summedbigram frequencies by position, where bigram is de ned as a sequence of two letters, it was obtained from ELP and is a measure of frequency of bigrams that is sensitive to positions within words by taking into account the letter positions where the bigram occurs.For example, the bigram frequency for DO in DOG counts DO bigrams only when they appear in the rst two positions of a word in the corpus.As lexicosemantic variables, we used (a) age of acquisition, (b) concreteness, (c) number of senses and (d) semantic neighborhood density.Mini-Mental State Exam and Montreal Cognitive Assessment.To measure general cognitive impairment/ dementia severity, we used scores obtained by participants on the Mini Mental State Examination (MMSE; Folstein et al, 1975) and the Montreal Cognitive Assessment (MoCA; Nasreddine et al., 2005), two test that are routinely used to screen a wide range of cognitive functions and identify patients on the AD continuum, as well as to determine disease severity.Boston naming test.To measure lexicosemantic abilities, the Boston Naming Test (BNT; Kaplan et al., 1983) was used.It measures the ability to orally label (name) drawing of objects.Participants have 20 seconds to name what the drawing represents after being presented with the image.A semantic cue isgiven if the participant fails to recognize the picture (e.g., answering bench instead of tree) or if they state that they do not know what the picture represents.The semantic cue is either a short explanation about the item (e.g., for a mask: "it's part of a carnival fantasy") or a superordinate category (e.g., for a beaver: "it's a kind of animal").
(Avants et al., 2008)intensity inhomogeneity correction(Sled et al., 1998)and intensity normalization into range.The pre-processed images were then both linearly (9 parameters: 3 translation, 3 rotation, and 3 scaling;Dadar et al., 2018)and nonlinearly(Avants et al., 2008)registered to a population appropriate average template generated based on 150 ADNI participants.The quality of all the image processing steps, including the linear and nonlinear registrations was visually veri ed by an experienced rater (MD).Deformation-based morphometry (DBM) (Baayen, 2008) accuracy was predicted by length, lexical frequency, orthographic neighborhood, bigram frequencies by position, age of acquisition, concreteness, number of senses, semantic neighborhood density, number of syllables, number of phonemes and phonological neighborhood as xed effects, with by-item and by-subject random intercepts as random effects.|z|valuesbeyond 1.96 were deemed as signi cant(Baayen, 2008).Bigram frequencies by position and number of senses were logarithmically transformed to normalize these variables.20 words with missing values in age of acquisition, objective lexical frequency, concreteness, and/or phonological neighborhood had to be excluded from this analysis.The remaining 30 words were ache, aisle, algae, asthma, blatant, bouquet, cellist, chord, courteous, debt, deny, depot, epitome, façade, gauge, heir, hiatus, hyperbole, naïve, nausea, papyrus, pint, placebo, scion, sieve, simile, subtle, super uous, thyme and zealot.
Groups means +/-standard deviation results of demographic, cognitive and language characteristics.Numbers in brackets indicate numbers of participants with the score when less than total.

Table 2
Multiple regression predicting AmNART total error score using neuropsychological tests results

Table 4
Consistent with our rst hypothesis, MCI and AD participants performed signi cantly worse than controls in reading of irregular words, controlling for sex, age and education.EMCI, LMCI and AD participants correctly read an average of 2.9, 3.8 and 7.4 fewer words, respectively, than HC.These measures are comparable to that of Weinborn and colleagues (2018) who, when using the Wechsler Test of Adult Reading (WTAR, another 50 irregular word test) found that MCI and AD participants read on average 3.0 and 7.4 fewer words, respectively, than HC.Consistent with hypothesis 2, results indicate that irregular word reading is correlated with general cognitive impairment / dementia severity.This relationship was similar in controls as it was throughout the different diagnostic categories, although and expectedly, that relationship becomes stronger as we advance throughout the AD continuum, when larger variations in impairment appear.Taken together, these two sets of results indicate that irregular word reading performances decline throughout the AD continuum and that estimates of premorbid IQ based on this measure should not be considered accurate in patients with MCI and AD.Indeed, the assessment of premorbid IQ with the AmNART in participants on the AD continuum violates the criteria set by Taylor and colleagues in 1996, that is to say that an accurate estimate of premorbid IQ should (a) not signi cantly change as disease progresses in severity and (b) not differ signi cantly from those of demographically matched control subjects.This has major clinical implications: given that irregular word reading estimates of premorbid IQ are inaccurate in this population, clinicians and researchers could be led to (Jaroudi et al., 2017;Rao et al., 2022accumulation of beta-amyloid proteins and phosphorylated tau(Chu, 2012)as well as presenting markedly more degeneration than in other neurodegenerative diseases affecting hippocampal volumes(Jaroudi et al., 2017;Rao et al., 2022).Since we found that AmNART scores are also highly correlated with disease severity, it is therefore unsurprising that AmNART scores and hippocampal volumes would be strongly associated.In addition to the well-documented involvement of the hippocampus in episodic memory, research also shows it could well be involved in semantic memory processes (Binder at al., 2009; Duff and Brown-Schmidt, 2012; Piai et al., 2016; Chapleau et al., 2019).The signi cant involvement of the ATL in irregular word reading performances in ROI-based analyses is consistent with previous results supporting the involvement of the ATL in irregular word reading (Wilson et al., 2012; Hoffman et al. 2015; Joyal et al., 2017; Ueno et al., 2018).These observations in AD patients are not dissimilar to ATL atrophy in semantic dementia, also accompanied with irregular word reading de cits (Woollmans et al., 2007).However, single word reading tasks like the AmNART have been hypothesized to not be demanding enough on the ATL (Taylor et al., 2013) which could explain the small effect size.Taken together, these results are consistent with the theory of irregular word reading impairments as an indicator of general cognitive decline and semantic decline, as opposed to an indicator of premorbid intelligence.