Internal Consistency and Test-retest Reliability of the Visual Memory Associative Test TMA-93

Background: Memory tests focused on binding may be more sensitive to diagnose Alzheimer´s disease (AD) at an early phase. TMA-93 examines binding by pairs of drawings of semantically-related objects. Preliminary validation studies have conrmed good diagnostic accuracy for the test to discriminate between patients with cognitive impairment and healthy controls (HCs) in low-educated populations. This study aimed to evaluate the reliability (internal consistency and test-retest reliability) of the TMA-93ina clinic setting with a high proportion of individuals with low-educational attainment. Methods.-The study was undertaken in a memory clinic of a university hospital in Southern Spain. The internal consistency of the 10 items that compose the TMA-93 was estimated on a cross-sectional, case-control study. 35 patients with amnestic Mild Cognitive Impairment and 40 healthy controls (HCs) matched by age, gender, and educational attainment were tested by the TMA-93. The participant’s performance on each of the 10 items was scored from 0 to 3, according to the instructions for the test (maximum total score= 30 points). The internal consistency was estimated by Cronbach’s alpha. In addition, “split-half reliability”, by Spearman-Brown coecient;“corrected item-total correlations”, and item redundancy, by “Cronbach’s alpha if item deleted”, were analyzed. The test-retest reliability for the TMA-93 total score was studied in 51HCstested by the same examiner 2-4 months apart and quantied by the intraclass correlation coecient (ICC). reliability

performance on each of the 10 items was scored from 0 to 3, according to the instructions for the test (maximum total score= 30 points). The internal consistency was estimated by Cronbach's alpha. In addition, "split-half reliability", by Spearman-Brown coe cient;"corrected item-total correlations", and item redundancy, by "Cronbach's alpha if item deleted", were analyzed. The test-retest reliability for the TMA-93 total score was studied in 51HCstested by the same examiner 2-4 months apart and quanti ed by the intraclass correlation coe cient (ICC).
Conclusion.-The internal consistency and test-retest reliability of the TMA-93 are "good". In addition to other psychometric properties, these results support the test for examining memory binding,particularly in contexts of low-educational attainment.

Background
Binding or associative learning is the recall of an image or a word facilitated by exposure to a second image or word, with which the rst was paired and simultaneously encoded [1,2]. This is a key mechanism in memory formation that deteriorates when there is damage in the hippocampus and other structures of the medial temporal lobe [3,4]. Memory tests focused on binding may be more sensitive than episodic memory tests to diagnose Alzheimer´s disease (AD) at an early phase [5]. The binding cost is the proportion between remembering isolated features (unbound task) and integrated one (bound task) and is signi cantly higher in AD patients and may be a useful clinical marker of the disease [6].
In Neuropsychology, binding ability can be examined by different tests. The Wechsler Memory Scale (WMS) assesses binding through the learning and recall of paired associated words (Wechsler, 1987) [7]. This subtest of the WMS discerns between easy (i.e. North/South) and di cult associations (i.e. School/Cellar) [7]. The most recently published "Memory Binding Test" (MBT) examines associative memory through the recall of pairs of items that belong to the same semantic category (i.e. ea/ant = insects), but are presented in two different lists of words [8]. Both tests rely on verbal information and may be hard for low-educated patients. Testing binding by images rather than words could be more appropriate in individuals with low education level. The Memory Associative Test of the district of Seine-Saint-Denis (TMA-93) was recently developed in France for the early diagnosis of AD among low educated immigrants [9]. Brie y, during the encoding phase the patient is shown ten pairs of drawings of daily life that are semantically related (Fig. 1A). In the recall phase, only one of the two items is shown and the patient is asked to recall the missing item ( Fig. 1B) (Maillet et al, 2017). In the original paper, the test demonstrated high diagnostic accuracy for discriminating AD patients from healthy controls in a sample of immigrant residents from a district in Paris (France) [9]. A posterior validation study in older educationally-diverse Spanish people demonstrated that the test is so sensitive as the picture version of the Free and Cued Selective Reminding Test (FCSRT) in discriminating between amnestic Mild Cognitive Impairment (aMCI) patients and healthy controls (HCs) [10]. In this study, the receiver operating characteristic (ROC) analysis determined an optimal area under the ROC curve (AUC) of 0.97 (95% CI, 0.89-1.00, p < 0.001) to distinguish between the two groups [10].
Another asset of the TMA-93 is its short administration time. The average time needed for its completion was 6 min for aMCI patients and only 3 min for HCs [10]. In this sense, the test could be feasible for screening memory complaints in busy primary care and general neurology outpatient settings with limited face-to-face time per patient.
There are normative studies for the TMA-93 from French [11] and Spanish population [12]. From these studies, we know that the test ran with a ceiling effect which is moderated by age and education [12].
This ceiling effect has been considered an advantage for the test in diagnosing aMCI since a small number of errors can be a worrisome result for a patient [10].
These potential uses and advantages of the TMA-93 encourages the completion of the development of the test, including the study of other psychometric properties as reliability. The aim here was to study internal consistency and test-retest reliability of the TMA-93.

Design
Internal consistency was evaluated in an extension of a cross-sectional, case-control study with convenience sampling originally designed to compare the discriminative validity of the TMA-93 to distinguish aMCI patients from HCs [10].
The test-retest reliability was evaluated in healthy controls who were randomly selected from an ongoing normative study [12] and invited to repeat the TMA-93 conducted by the same examiner between 2 and 4 months after the initial examination.

Study population
The sample of the case-control study consisted of 75 participants from an urban area of Spain. They comprised 2 groups: 35 patients with a MCI and 40 HCs matched for age, gender, and educational level (incomplete primary studies, only primary studies completed, and higher than primary studies). All participants were older than 50 years and spoke Spanish as their native language. All patients were selected by convenience sampling of consecutive cases who were diagnosed of aMCI at the Memory Unit of the Hospital Virgen del Rocio (Seville, Spain). The procedures had consisted of general, neurological, neuropsychological, laboratory, and neuroimaging examinations. Neuropsychological evaluation included "Spanish Version of the Informant-Based AD8 Questionnaire" (AD8) [13], "Phototest" [14], "Delayed Matching-to-sample Task 48" (DMS-48) [15], "Free and Cued Selective Reminding Test" (FCSRT) [16], "Geriatric Depression Scale 15 items" (GDS-15) [17], and "Interview for Deterioration in Daily Living Activities in Dementia" (IDDD) [18]. The diagnosis of aMCI had been made according to the National Institute on Aging and Alzheimer's Association (NIA-AA) recommendations [19] and operationally put into practice as follows: (a) memory complaints corroborated by a reliable informant (score equal or above 3 on AD8), (b) objective memory impairment measured by a score equal or below the 10-percentile on set 2 of DMS-48 (this score being lower than that on set 1), and (c) no signi cant functional decline for activities of daily living (score up to 39 on IDDD was allowed).
HCs for the case-control study were recruited among the caregivers and relatives of patients attending the center. They met the following inclusion criteria: (a) absence of memory complaints, (b) absence of objective memory impairment (DMS-48 set 2 score equal or above the 25-percentile), and (c) intact level of independence in activities of daily living (score between 33 and 36 on IDDD).
HCs for the test-retest reliability were recruited among the participants in the Spanish normative study for the TMA-93 [12]. Inclusion criteria for this study were: 1) age equal or above 50; 2) no cognitive complaints; 3) score equal or above percentile 10 according to normative data for the Phototest in Spain (Carnero Pardo et al, 2018); and 4) independent level of functioning [12]. 51 randomly selected HCs were invited to repeat the TMA-93 conducted by the same examiner between 2 and 4 months after the initial examination for studying the test-retest reliability.

Instrument: TMA-93
In both studies, TMA-93 was administered following the instructions given by its authors [9]. During the encoding phase, subjects were shown and asked to name 10 pairs of real life semantically-related objects presented as drawings in cards (tree/ bird, bed/ bedside lamp, boat/ sh, dog/ sheep, foot/ trousers, knife/ apple, glasses/ book, hand/ watch, car/ car keys, ower/ sun). The examiner speci cally asked the participants to memorize the pairs of drawings (Fig. 1A). Next, the rst associative memory trial was administered: examinees were shown only one of the two objects of each pair and asked to recall the missing one (Fig. 1B). After each subject's response (regardless accuracy) or a period of up to 5 seconds, we displayed the pair again. This protocol was repeated for the 9 remaining pairs. The maximum score of 30 points was granted only when the participant produced 10 correct responses in this rst trial, in which case, the second and the third trials were omitted. Otherwise, the participants were scored from 0 to 9 based on their number of correct answers in this rst trial and were administered a second similar trial with the same 10 pairs of drawings. If a subject correctly recalled the 10 missing objects in this second trial, s/he was given 20 points: 10 points corresponding to the second trial, and 10 more corresponding to the third trial, which was cancelled.
The score of each of the 10 items of the TMA-93 ranged from 0 to 3 and these scores were used for estimating the internal consistency of the test.

Statistical analyses
Descriptive results are shown as frequency (percent) for dichotomous and categorical variables, mean (± SD, range) for normally-distributed continuous variables, and median [interquartile range (IQR), range] for non-normally distributed continuous variables. Between group comparisons of continuous variables were performed with Student's t test or its non-parametric alternative Mann-Whitney U test. Between group comparisons of categorical variables were performed with Chi square test. The diagnostic accuracy of the TMA-93 was estimated by the area under curve (AUC) using receiver operating characteristic (ROC) curve analysis and classi ed as excellent (> 0.9), good (0.8-0.9), and poor (< 0.8).
Internal consistency was estimated by Cronbach's alpha. Values of Cronbach's alpha above 0.70 were considered acceptable, between 0.90 and 0.95 were considered "optimal", and above 0.95 were interpreted as indicative of "item redundancy" [20,21]. In addition, "split-half reliability" was analyzed considering the rst ve pairs of drawings of the TMA-93 as a half and the last ve ones as the other half and considering the correlation between each other by Spearman-Brown coe cient. "Corrected item-total correlations" were calculated and a value below 0.40 was considered as indicative of item redundancy [21]. Item redundancy was also evaluated by "Cronbach's alpha if item deleted" [22], considering any item as redundant if Cronbach's alpha increased at deleting it.

Results
Socio-demographics characteristics and neuropsychological background for the cross-sectional study are shown in Table 1. For the total sample (n = 75), 46 participants were females. Their average was 74.6 (SD = 5.9, range = 51-84). With regards to educational attainment, 31 individuals (41.3%) had not completed primary studies (Table 1). Internal consistency was "good" (Cronbach's alpha = 0.936). Split-half reliability considering the rst ve pairs of drawings of the TMA-93 as a half and the last ve ones as the other half was also high (Spearman-Brown coe cient = 0.911). Corrected item-total score correlations ranges from 0.661 for the pair "hand-watch" to 0.837 for the pair " ower-sun" ( Table 2). There was no redundancy of any item as Cronbach's alpha did not increase at deleting anyone (Table 2). Legend: Corrected Item-Total Correlation was never lower than 0.400. Cronbach's alpha was not above 0.936 at deleting any item. Both results demonstrated no redundancy of any item. Legend: Results are shown as median, interquartile range, and (range) for non-normal distributed variables and mean ± SD, and (range) for normal distributed variables.

Discussion
To our knowledge, this is the rst study focused on internal consistency and test-retest reliability of the TMA-93, the French visual binding test [9]. Internal consistency was optimal and test-retest reliability was good.
Internal consistency among the 10 pairs of semantically-related drawings of real life objects that compose the test was optimal (Cronbach's alpha = 0.936). This result means the 10 items of the test are highly correlated each other and measure the construct of interest, visual binding, in a homogeneous way [23].
"Corrected Item-Total Correlation" is the correlation of the item designed with the summated score for all other items. A rule-of-thumb states that this value should be at least 0.40 to rule out item redundancy [21]. Every item of the TMA-93 ful lled the rule. In the same way, Cronbach alpha did not increase at deleting any of the ten pairs, so, again, no item redundancy could be demonstrated.
Split-half testing is another measure of internal consistency. This method measures the extent to which all parts of the test contribute equally to what is being measured. Here, the TMA-93 was virtually split into two halves and the scores from both parts of the test were correlated. We found a strong correlation between the two halves that indicates that participants of the cross-sectional study performed equally well (or as poorly) on both halves of the test.
The TMA-93 showed a "good" test-retest reliability by the intra-class correlation coe cient [ICC = 0.802 (CI 95%=0.653-0.887)]. Having good test-retest reliability signi es the internal validity of the TMA-93 and ensures that the measurements obtained in one setting are both representative and stable over time [24].
Analyzing the distribution of the ""total score time 2 minus total score time 1" variable, there were four atypical observations. These four outliers probably precluded that the test-retest reliability of the TMA-93 could be upgraded to "excellent". Two of them scored 3 points more at the retest. On the opposite side, two outliers scored 2 and 3 points less, respectively, at the retest. The former could be explained by practice effect and the latter by cognitive decline, but a more global explanation could be that binding is somewhat changeable and dynamic, making it di cult to achieve better test-retest reliability.
Both samples were composed of a relatively high percentage of low-educated participants. Lack of education remains as a limitation in many elderly Spanish people since they had limited access to primary school in the aftermath of the Spanish Civil War (1936)(1937)(1938)(1939). Although the situation has signi cantly improved in recent years, 59% of the population over 65 years of age in Spain did not complete primary studies [25]. Neuropsychological examination must comply with this handicap. Accordingly, the design of this study included tests based on images as Phototest [14], DMS-48 [15], and the picture version of the FCSRT [16]. Being a speci c memory test, its short administration time is an advantage for the TMA-93 in comparison to DMS-48 and FCSRT. Here, the test was again demonstrated accurate and feasible to be administered to low-educated individuals. Despite of this feasibility, the TMA-93 total score should be expected lower in low-educated individuals according to normative studies [11,12].
In addition to optimal diagnostic accuracy previously reported for the TMA-93, the good reliability here demonstrated encourages the completion of the development of the test. Next steps will be phase II and phase III validation studies including AD biomarkers and comparing the diagnostic accuracy of the test with that of the standard memory instruments on samples organized by educational attainment.

Conclusion
This study focused on internal consistency and test-retest reliability of the TMA-93 shows a good reliability for the instrument. In addition to diagnostic accuracy and short administration time, this good reliability is another asset to choice the TMA-93 for testing binding, particularly in individuals with low educational attainment.

Declarations
Ethics approval and consent to participate Both studies were approved by the ethics committee of the Hospital Virgen del Rocio (Seville, Spain) and conducted according to the World Medical Association Declaration of Helsinki. All participants accepted the study procedures by signing an informed consent.

Consent for publication
It was included in the informed consent Availability of data and materials The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
Didier Maillet is the author of the TMA-93.

Funding
The test-retest study was based on a normative study supported by Hoffmann-La Roche.
Authors contribution SRH and EFM designed the study and wrote the paper. EGC, ALT, and SRH collected the data. EFM and MBS were responsible for carrying out the statistical analysis. D. Maillet assisted with writing the article.  Figure 1A). In the recall phase, the subject has to recall the missing object ( Figure 1B).  Figure 1A). In the recall phase, the subject has to recall the missing object ( Figure 1B).