The Anatomical Correlates of Abstract and Concrete Words: A meta-analytical review of whole-brain imaging studies

Several studies have investigated how abstract and concrete concepts are processed in the brain, but data are controversial, in particular neuroimaging data contrast with clinical neuropsychological observations. A possible explanation could be that previous meta-analyses considered different types of stimuli (nouns, verbs, literal and gurative sentences). Using the ALE method, we meta-analyzed 32 brain-activation imaging studies that considered only words (nouns and verbs). Five clusters were associated with concrete words (the left superior occipital, middle temporal, parahippocampal and bilateral posterior cingulate, angular, and precuneus gyri); four clusters were associated with abstract words (left IFG, superior, and middle temporal gyri). When only nouns were considered three left activation clusters were associated with concrete stimuli and only one with abstract nouns (left IFG). These results conrm that concrete and abstract word processing involves at least partially segregated brain areas, the IFG being relevant for abstract nouns and verbs while more posterior temporo-parieto-occipital regions seem to be crucial for concrete words. at addressing the following research questions: (i) which are the neural correlates of concrete and abstract words, i.e., which regions are consistently activated across experiments that required participants to process abstract and concrete words? (ii) how the results might vary depending on the type of material used (noun or verb stimuli), fMRI recording tasks (e.g., semantic judgments, lexical decision, etc.), and the modality of presentation (visual or auditory). The rationale of this sub-analyses is based on the fMRI literature suggesting that stimulus types, presentation modality, and task could impact on the pattern of activation since minor variations in paradigms can produce large changes in cognitive strategies 32,33 .

objective but the results can be biased by the type of contrast applied; indeed, discrepancies in the patterns of cortical activation across studies may be attributable, at least in part, to differences in baseline tasks, and hence, re ect the limits of the subtractive logic.
For these reasons, we did not include the same studies that were included in the previously mentioned meta-analyses. 1. We used a different method, choosing the more popular Activation Likelihood Estimation [35][36][37] (ALE) as compared to the multilevel kernel density analysis (MKDA) 38 applied by Wang et al. 31 . MKDA and ALE produce similar results, both using the location (xyz-coordinates) of local maxima reported by the individual studies, but MKDA uses a spherical kernel whose radius is determined by the analyst 39 while ALE applies a Gaussian kernel whose FWHM is empirically determined. Moreover, our analyses are conducted on the last version of the GingerAle software, which managed to rectify some of the previous limitations of this instrument, e.g., the frequently used FDR correction is no longer supported 37 and proposes new best-practice ALE recommendations like the cluster-level family-wise error (FWE) corrected threshold of p < .05 40 .
2. Our results are an update of the previous reviews, including publications from the last 10 years.

Materials And Methods
The present systematic review was conducted under the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines 41 .
Titles, abstracts, and full-text articles were screened and evaluated for eligibility based on the following criteria: presentations from international meetings with no speci c data provided, perspective and opinion publications, case reports, series of cases, previous reviews or meta-analyses, studies not published in, or translated into English, phrases or sentences stimuli, studies without adequate information (e.g., stereotaxic coordinates) to analyze the concrete vs. abstract contrasts and no reply from the authors after asking for the missing data.
As previously speci ed, we looked for publications that reported concrete > abstract words or abstract > concrete words contrast, without analyzing the exact strategies that the authors applied to divide word stimuli into the two categories. Often, the abstractness/concreteness constructs are operationalized in the papers based on two rating methods: (1) asking participants to classify a word as concrete taking into consideration the degree to which it refers to a tangible entity in the world (it has clear references to material objects); (2) or by evaluating its imageability, i.e., the ease with which the word elicits a mental image.
Generally speaking, words referring to something that exists in reality and one can have an immediate experience of it through the senses are considered Page 4/24 concrete (e.g., animals, tools); while words whose meaning cannot be experienced directly but can be de ned by other words, internal sensory experience, and linguistic information, are classi ed as abstract (e.g., emotions, morality, social interaction, time).
After removing duplicates, research papers which did not satisfy the above criteria were excluded. For example, several studies focused on sentences or phrases 42,43 ; or reported only words > baseline contrasts 44 . The more conservative concrete > abstract and abstract > concrete contrast (as opposed to concrete > baseline or abstract > baseline contrasts) was chosen in order to avoid a variety of baselines that could range from resting state, xation cross 45 to pseudowords 46 and number or letters 47 and could affect the interpretation of the results, since subtractions from different baselines create different activation patterns.
If the same data were reported in different publications, we chose the most recent one and with the highest number of participants 48,49 .
Uncertainties regarding some inclusion were solved by the authors through discussion. The PRISMA ow of information diagram was used to track the search process as presented in Fig. 1 and the main characteristics of the studies included in this meta-analysis are reported in Table 1. The p values (the statistical threshold for the neuroimaging univariate analysis conducted in the included papers) are reported as they were presented in the original articles; the exact value and the correction procedure was not always speci ed.
2.2 Classi cation of the raw data before clustering analyses From the selected papers, only the stereotactic coordinates representing the concrete > abstract or abstract > concrete contrasts were extracted. Following this procedure we obtained 295 foci from a total sample of 535 participants. The stereotaxic coordinates reported in terms of the Talairach and Tournoux atlas 50 were transformed into the MNI (Montreal Neurological Institute) stereotaxic space 51 using the tal2icbm transforms implemented in the GingerALE software 35,37,52 .
For all the stereotaxic coordinates we extracted the relevant information about the statistical comparisons that generated them. More explicitly, we reported the MNI coordinates (MNI x,y,z), the name of the rst author, the journal and the year of publication of the paper, the technique (PET or fMRI) and the stereotactic space used, the age of participants, the type of task, the nature of the contrast from which the peak was extracted, the statistical thresholds, the stimulus type (nouns or verbs) and the presentation modality (auditory or visual).

Clustering Procedure
Once obtained the set of MNI coordinates, the meta-analyses were carried out using the revised ALE algorithm 35,37 implemented into GingerALE software Version 3.0.2 52 (http://brainmap.org/ale). The ALE algorithm aims to identify areas with a convergence of reported coordinates across experiments that are higher than expected from a random spatial association. The logic behind this approach implies a spatial probability distribution modeled for each activation peak included in the dataset of interest. Reported foci are treated as centers of 3D Gaussian probability distributions capturing the spatial uncertainty associated with each focus 52 . The between-subject variance is weighted by the number of participants per study, since larger sample sizes should provide more reliable approximations of the "true" activation effect. The voxel-by-voxel union of these distributions is used as an activation likelihood map, subsequently tested for statistical signi cance against randomly generated sets of foci. ALE was proven to be a reliable way of blending evidence from multiple studies 37 and was used successfully in different elds e.g., 53 .
More speci cally we used the following procedure: -anatomical ltering -we applied a rst ltering of the coordinates using the most conservative (smallest) mask available in the GingerALE software and 17 foci from the total of 295 fell out of the mask.
-ALE maps (quantify the degree of overlap in peak activation across experiments) were calculated using the modi ed ALE algorithm and the random-effects model 35,37 ; -thresholding procedure -for each ALE calculation described below signi cance was tested using 1000 permutations with a cluster forming threshold of p< 0.001 (uncorrected). In order to increase test sensitivity to false positives signi cance was corrected with a cluster-level family-wise error threshold of p< 0.05 40 as used by other meta-analytic studies 54 .
Unfortunately, ALE cannot deal with multiple independent variables designs, and in this paper we intended to consider the role of different variables like (i) stimulus type (nouns only, verbs only or all the words stimuli), (ii) modality of presentation (visual only, auditory only or both visual and auditory), and (iii) task speci city (e.g., lexical, semantic tasks or all tasks). The ALE strategy we choose in this case was to consider separate sets of foci for each variable and run one meta-analysis for each of these sets when the number of papers was large enough. To this purpose, the overall dataset was divided into several subsets, which automatically implied running meta-analyses on a low number of foci (lowering the power). An important limitation of this approach is that we are not able to statistically assess the interaction between variables like stimuli type and task.
The analyses were based on the following contrasts: (i) an analysis included the activation peaks associated with word processing independently of the stimulus type and task (iv) an analysis on peaks associated with lexical (words or non-words classi cation task), or semantic decision tasks (e.g., pleasantness decision task, answering a question about the stimuli), excluding all the studies based on: memory tasks (2 studies), perceptual decision task (1 study), mental image generation (3 studies), passive reading (2 studies).
concrete > abstract word (only lexical and semantic tasks) included 114 stereotactic activation loci from 16 studies, 273 participants abstract > concrete word (only lexical and semantic tasks) included 116 stereotactic activation loci from 17 studies, 289 participants For anatomical labeling and gures, we capitalized on the Automatic Anatomical Labeling (AAL) template available in the MRIcron visualization Software (https://www.nitrc.org/projects/mricron).

Results
Once the appropriate studies were collected, we used activation likelihood estimation (ALE) to meta-analytically remodel available neuroimaging data.

CONCRETE > ABSTRACT Meta-analysis
The GingerALE procedure run over the concrete words > abstract words set of coordinates identi ed a total of 5 clusters, with 1 to 4 individual peaks each, from 4 to 11 different studies (Fig. 2). Regions that were consistently activated across experiments were localized in the bilateral middle temporal gyrus and posterior cingulate, the left parahippocampal gyrus, left fusiform gyrus, bilateral precuneus and angular gyri, left superior occipital gyrus and left cerebellum culmen. The peaks distribution for each signi cant cluster is reported in Table 2. Abbreviations H = Hemisphere; ALE = activation likelihood estimation; Nr. = number of studies that contributed to each cluster; L = left; BA = Brodmann area; ** = between brackets are the number of foci from each study that contributed to that speci c cluster; R = right.
A similar activation pattern, except for the right hemisphere involvement, was observed when only studies reporting exclusively noun stimuli were taken into consideration (concrete nouns > abstract nouns). We observed three left activation clusters (Fig. 3, Table 3) situated in the middle temporal gyrus, parahippocampal gyrus, posterior cingulate, precuneus, superior occipital gyrus, and culmen (left cerebellum anterior lobe). Abbreviations H = Hemisphere; ALE = activation likelihood estimation; Nr. = number of studies that contributed to each cluster; L = left; BA = Brodmann area; ** = between brackets are the number of foci from each study that contributed to that speci c cluster The ALE procedure run over the concrete words > abstract words, visual stimuli only set of coordinates, identi ed a total of 5 clusters, with 1 to 6 individual peaks each, from 4 to 8 different studies (Fig. 4). Regions that were consistently activated across experiments were localized in the left middle temporal gyrus, bilateral posterior cingulate, and parahippocampal gyrus, left fusiform gyrus, bilateral precuneus and angular gyri, left superior occipital gyrus and left cerebellum culmen. The peaks distribution for each signi cant cluster is reported in Table 4.   Table 5). Abbreviations H = Hemisphere; ALE = activation likelihood estimation; Nr. = number of studies that contributed to each cluster; L = left; BA = Brodmann area; ** = between brackets are the number of foci from each study that contributed to that speci c cluster; R = right

ABSTRACT > CONCRETE Meta-analysis
The revised ALE algorithm discriminated four clusters that correlated with abstract word processing in a healthy population (Fig. 6), from four to 12 different papers ( Table 6). Our analyses identi ed a robust neural pattern of activity in the left frontal and temporal lobes, speci cally, the inferior frontal gyrus, the superior and middle temporal gyri and left inferior parietal.  brackets are the number of foci from each study that contributed to that speci c cluster When only abstract nouns (abstract nouns > concrete nouns) were analyzed, the results indicated a single cluster with two peaks, from 9 studies, in the left inferior frontal gyrus (Fig. 7, Table 7).  brackets are the number of foci from each study that contributed to that speci c cluster We identi ed three clusters associated with abstract words processing in a healthy population when only studies reporting abstract visual stimuli were included (Fig. 8), from 4 to 12 different papers (Table 8). Our analyses revealed a robust neural pattern of activity in the frontal and temporal lobes, speci cally, the inferior frontal gyrus and the superior and middle temporal gyri. Abbreviations H = Hemisphere; ALE = activation likelihood estimation; Nr. = number of studies that contributed to each cluster; L = left; BA = Brodmann area; ** = between brackets are the number of foci from each study that contributed to that speci c cluster When only foci from lexical and semantic tasks were analyzed, the results indicated 2 clusters (with 1 to 4 individual peaks each, from 3 to 9 different studies), in the left inferior frontal gyrus, superior and middle temporal gyrus ( Fig. 9 and Table 9).

Discussion
As we pointed out in the introduction, neuropsychological studies suggest a role of the lateral prefrontal cortex in processing abstract words and of the left anterior temporal lobe in processing concrete ones. We then run a meta-analysis to assess whether imaging data con rm this segregation.
Thirty-two imaging studies were included, which evaluated the activation patterns in response to concrete and abstract concepts in order to evaluate whether their processing recruits separate brain circuits, and, if so, where those speci c areas are located in the brain. All the data included in the ALEanalysis are based on general linear model, GLM.
We also looked for studies that used the more modern multivariate pattern analysis, i.e., a set of methods that analyze neural responses as patterns of activity 85 , in order to have a separate dataset with this type of methods. Unfortunately, we found a very small number of publications preventing a further meta-analytic procedure 42,86,87 .
The results of this meta-analysis, consistent with those of previous research 31, 34 , con rmed that concrete and abstract words processing relies, at least in part, on different brain regions. The ALE procedure was completely data-driven, without a prior theoretical basis, and the results are constrained only by the nature of our data (e.g. the limited temporal resolution of the neuroimaging techniques, the correlational nature of the data), and by our inclusion/exclusion criteria.
As previously mentioned, experiments testing for greater activation for concrete than abstract words (concrete words > abstract words) converge in the temporo-parieto-occipital regions; namely, the left middle temporal gyrus, left fusiform, left parahippocampal and lingual gyri, bilateral angular gyrus and precuneus, bilateral posterior cingulate, left superior occipital gyrus and left culmen in the cerebellum. The neuroimaging evidence indicates that concrete concept representations are at least partly associated to the perceptual system, and also rely on mental imagery (precuneus, superior occipital gyrus).
Binder et al. 34 found signi cant overlapping for concrete stimuli in the angular gyrus bilaterally, left mid-fusiform gyrus, left posterior cingulate, and left dorsomedial prefrontal cortex (DMPFC). With the exception of DMPFC that might be related to the stimuli complexity and/or different baselines, all the other regions are con rmed by our data. At variance with Wang et al.'s meta-analysis 31 we found a bilateral involvement of the posterior cingulate cortex, angular and precuneus gyri. Binder et al. 34 found that the angular gyrus was the most reliably activated area across the 120 studies (included in their meta-analysis) and interpreted these data as an indicator of its involvement in concrete concepts semantic representation. Another area activated for concrete > abstract concepts was the bilateral posterior cingulate cortex (PCC). Although involved in many semantic-based tasks, the function of the PCC in semantic cognition is still debated. The following hypothesis are proposed. (i) this region could act as a supramodal convergence zone 34 , (ii) PCC activation could re ect the greater engagement of an imagery-based perceptual system for concrete stimuli, or (iii) PCC might be an interface between semantic knowledge and episodic memory 86 . The precuneus also seams associated with visuospatial imagery, a hypothesis supported by experiments conducted on episodic memory retrieval and linguistic tasks which required the processing of high imagery words or mental image generation 78 .
The same regions were found when only nouns were considered (concrete nouns > abstract nouns contrast) with the difference that the right hemisphere activation disappeared. The two right hemisphere clusters might be speci cally correlated with action verbs but this result could also be a consequence of the lack of power due to the limited number of studies (15 studies in the nouns dataset vs. 22 in the noun-and-verb database).
The results on abstract words replicated those reported by Wang and colleagues 31 and Binder et al. 34 ; higher activation for abstract compared to concrete words conditions (abstract words > concrete words) is more frequently reported in a left lateralized network, encompassing the inferior frontal gyrus (IFG, Brodmann areas 45, 47), a very small portion of the precentral gyrus, the superior and middle temporal gyri, and inferior parietal.
Concerning the left IFG, it has been suggested that the ventrolateral prefrontal cortex (VLPFC) implements semantic control in two steps 88 .
Step 1 constitutes controlled access to stored representations when bottom-up input is not su cient.
Step 2 operates at post-retrieval and is thought to bias competition among representations that have been activated during Step 1. According to Badre and Wagner 89 , both steps recruit VLPFC, though different parts of it, with BA 45 involved in Step 2. In other words, rather than abstract knowledge representation, IFG activation could re ect a higher level of semantic control processes (additional resources) since abstract stimuli might require semantic selection, irrelevant cues inhibition, effortful integration, top-down control and workingmemory related processes 90 , in agreement with the context availability theory 91 . In line with this hypothesis, this region showed greater activation for abstract words when a judgment task was made following irrelevant cues and reduced activation when semantic decisions were made with contextual help, supporting the idea that this area responds more strongly to abstract words because their meanings are inherently more variable and require more control during linguistic processing as compared to the concrete ones 47,92 . An alternative explanation is offered by Della Rosa 93 using a lexical decision task, they found that the left IFG was particularly active during presentation of words characterized by low imageability and low context availability. The authors' interpretation was that this area could be a functional convergence zone between imageability and context availability, differentiating abstract from concrete concepts.
A result, which is totally in contrast with the neuropsychological literature, is the activation of the anterior part of the superior and middle temporal gyri. In fact, apart from the main single cases of herpes simplex encephalitis and semantic dementia, with a reversal of concreteness effect in the presence of bilateral anterior temporal lobe damage, there are now several group studies con rming the evidence of a concrete word impairment after anterior temporal lobe atrophy. In particular, a study comparing the behavioral variant of frontotemporal dementia (FTD), in which there is a predominant prefrontal atrophy, to the semantic variant, with an anterior temporal atrophy showed that while the former group of patients had an increase of the concreteness effect, the reversal was found in the semantic variant group. Similarly, patients with left Anterior Temporal Lobe (ATL) resection show the same pattern of reversal concreteness effect 27 . One possibility is the type of task used; the selected studies used very different tasks (pleasantness judgment, memory tasks, lexical decision, etc.) while, in general, the reversal of concreteness effect in patients is mainly found in naming and comprehension tasks and, when tested, in semantic judgments.
Orena et al. 30 , for example, using direct electrical stimulation (DES) for brain mapping during awake surgery found no behavioral differences between BA 44 and BA 38 stimulation while patients performed a lexical decision task, but they registered a dissociation between abstract and concrete words during a concreteness judgment task; in particular, abstracts words were impaired during stimulation of BA 44 and concrete words during BA 38 stimulation. However, it has to be underlined that, when only abstract nouns (and not verbs) were considered, the clusters in the left superior and middle temporal lobe lost signi cance, supporting the idea that the cerebral networks deputed to noun and verb processing might be slightly different. It is important to mention that, even when only nouns were taken into account, selected stimuli to represent abstract or concrete items greatly varied among studies encompassing emotions, mind states, living and nonliving things, of different frequency of use, age of acquisition and imageability. In addition, many studies use interchangeably the concreteness and imageability terms, which are in fact two distinct properties that can differently affect naming and recall [94][95][96] .
Neuroimaging studies are often hard to compare and many variables could in uence the reported results as the duration of the stimuli presentation and the stimuli number. For example, in the same type of experiment a large number of stimuli [e.g., 164 nouns in 68 ] were presented while in others, only four words were repeated for more than 140 trials 72 . Another relevant element is the participants' age since aging can modify neural organization due to neuroplasticity 97 .
With two exceptions 63,71 in which the participants' mean age was > 70, all the other studies included a young population with a mean age < 30 (see Table 1). Neuropsychological studies (on patients) involve a different population ranging from 55 to 75.
We also controlled for presentation modality. When only visually presented words were included in the analysis no relevant differences were observed between auditory and visual stimuli combined, and only visually presented words (see Fig. 5 and Fig. 9). This can be partially due to the very small number of studies using auditory information (only 5 studies out of 32 used auditory stimuli).
According to Eickhoff et al. 40 , the statistical power of the current meta-analysis to detect not only large, but also small-and medium-size effects can be considered acceptable. Nevertheless, meta-analytic power is intrinsically limited by the number of currently available data especially for two sub-analyses: (i) concrete nouns > abstract nouns, only 15 independent experiments, and (ii) lexical and semantical task -concrete words > abstract words, 16 studies. Another limitation is related to the sample size of the included experiments that ranged from 6 to 28 participants. This presumably limited the publications power to detect small-and medium-size effects.
the IFG being relevant for abstract nouns and verbs, but we could not nd evidence of the ATL role for concrete items. Our data indicate a more posterior activation for concrete words in regions that are often correlated with mental imagery processes, updating (adding more studies and controlling for possible confounding factors) and partially con rming the results of the previous reviews on the same topic. The discrepancy between clinical neuropsychological and neuroimaging data deserves further investigation, for example by means of balanced groups of healthy and clinical participants, combining different techniques in the same experiment as TMS-EEG, or TMS and fMRI.