Neuronal tuning and population representations of shape and category in human visual cortex.

doi:10.21203/rs.3.rs-3121050/v1

Download PDF

Article

Neuronal tuning and population representations of shape and category in human visual cortex.

https://doi.org/10.21203/rs.3.rs-3121050/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 30 May, 2024

Read the published version in Nature Communications →

Version 1

posted

You are reading this latest preprint version

Object recognition and categorization are essential cognitive processes which engage considerable neural resources in the human ventral visual stream. However, the tuning properties of human ventral stream neurons for object shape and category are virtually unknown. We performed the first large-scale recordings of spiking activity in human Lateral Occipital Complex (LOC) in response to stimuli in which the shape dimension was manipulated independently from the category dimension. The neuronal representations in LOC were primarily shape-based, although we also observed category-like encoding for images of animals. Surprisingly, linear decoders could reliably classify stimulus category even in data sets that were entirely shape-based. In addition, many tuning curves showed an interaction between shape and category tuning. These results represent the first detailed study on shape and category coding at the neuronal level in the human ventral visual stream, furnishing essential evidence that reconciles human imaging and macaque single-cell studies.

Biological sciences/Neuroscience/Visual system/Extrastriate cortex

Biological sciences/Neuroscience/Visual system/Object vision

Object recognition and categorization are fundamental cognitive processes, essential for understanding and interpreting the visual world. The lateral and ventral occipitotemporal cortices (OTC) are key regions involved in these processes. ^1,2 Nevertheless, the precise functional organization, neuronal tuning properties and hierarchical structure of this large cortical region remain unclear.

Functional magnetic resonance (fMRI) studies in humans have shown that the Lateral Occipital Complex (LOC) is particularly sensitive to shape features, ^3,4 and bears remarkable similarities with the macaque inferior temporal cortex. ^5–7 Along the hierarchy organization of the human ventral visual stream, functional activations emerge suggesting the existence of more categorical object representations for diverse stimuli, including faces⁸, bodies ⁹, scenes ¹⁰, hands ¹¹, letter strings ¹², and food items.^13,14

However, the current body of evidence is insufficient to draw definitive conclusions regarding category selectivity at the neuronal level in the human OTC. First, prior research has tested a relatively small number of categories. Additionally, the limited spatiotemporal resolution of fMRI does not allow to make strong inferences about the underlying neuronal selectivities without a number of assumptions. ^15–17 Thus, to gain a deeper understanding of the neural mechanisms underlying object processing, single-cell recordings in macaques have been crucial, a model that has been validated by evidence of a common organization of object space in humans and monkeys.¹⁸

In macaques, neurons in prefrontal and posterior parietal cortex exhibit distinct categorical representations, indicating their crucial involvement in higher-level visual processing. Conversely, the inferior temporal cortex (ITC) shows only weak or absent category effects ^19–21 (except in face or body patches ^22,23). However, in humans, an fMRI study ²⁴ manipulated shape type and category independently, and reported both shape and category sensitivity in lateral and ventral occipitotemporal cortex, with a gradual progression from more shape-based representations posteriorly to more category-based representations in more anterior brain regions. Yet again, in the absence of data on the actual neuronal tuning properties of human visual neurons it is difficult to relate these fMRI findings on human lateral occipitotemporal cortex to the existing electrophysiological evidence in the macaque ventral visual stream.

To bridge this looming gap between human fMRI and macaque electrophysiology, we recorded multi-unit activity (MUA) and high-gamma responses in the human LOC using intracortical microelectrode arrays during the presentation of shapes belonging to different categories, in which the shape dimension was dissociated from the category dimension as in Bracci et al. ²⁴ We employed a diverse set of analysis techniques to investigate shape and category representations both at the individual channel level and at the population level. We found mainly shape-based representations with a large number of shape-category interactions in individual recording channels. At the population level, the neuronal dissimilarities did not correlate with behavioral category judgments, but linear decoders could correctly classify category information in every array tested. These results represent the first detailed study of shape – and category coding at the level of spiking activity in human visual cortex.

Figure 1A shows the reconstructed anatomical locations of the arrays (Montreal Neurological Institute (MNI) coordinates in Table 1) and the average normalized net responses of all visually-responsive channels to the intact versus scrambled stimuli (classic LOC stimuli and naturalistic LOC images). The significantly stronger responses to intact images of objects compared to scrambled ones demonstrate that all arrays were located in shape-sensitive cortex, in agreement with Decramer et al. ²⁵ However, it should be noted that there is diversity in our findings across the four arrays. While the stronger responses to intact images compared to scrambled ones are observed in most arrays, for array 3, this statement only holds true for the classic localizer, and in array 1, the selectivity is minor. One possible reason for this variability is that the localizer stimuli were not optimal for each array. The stimuli presented during the localizer task may not have fully captured the preferred shapes or specific categories for each array. Had the arrays been presented with optimal intact and scrambled stimuli tailored to their specific preferences, the differences in selectivity among the arrays may have been more pronounced.

Table 1

MNI coordinates of Utah arrays
ARRAYS	X	Y	Z
1	42	-76	-1
2	-35	-89	-8
3	-41	-83	9
4	-38	-84	-5

Single – channel responses reveal tuning complexity

We recorded from 237 visually responsive MUA sites (array 1: 51, array 2: 94, array 3: 27, array 4: 65) and 332 visually responsive LFP sites (high – gamma, 60–120 Hz; array 1: 85, array 2: 96, array 3: 86, array 4: 65). First, we determined the selectivity for shape, for category and any shape-category interactions (Fig. 1B) using 2 – way ANOVA on the net MUA and LFP responses (see Methods). Figure 2 shows the MUA (Fig. 2A, B and C) and LFP (Fig. 2D, E, and F) responses for six (three MUA sites and three LFP sites) example channels. The first example channel (recorded in array 2, Fig. 2A) responded strongly to several shape types (e.g. shape type 5,6 and 8), but much less to other shape types (e.g. shape type 7 and 9, main effect of shape p_shape = 0.0001). The different categories within each shape type evoked similar responses in this MUA site (p_category = 0.52, p_interaction= 0.65, Supplementary table 1 for details on statistics). The robust shape selectivity and lack of category selectivity were also evident in the average responses of the LFP example site (recorded in array 2) (Fig. 2D). In contrast, the example site in Fig. 2B (recorded in array 3) responded strongly to certain exemplars of the category ‘animals’ (those from shape types 5 and 6), which represents a significant shape x category interaction (p = 0.0007) with a weak main effect of category (p = 0.026) and no significant main effect of shape (p = 0.06, Supplementary table 1). The shape x category interaction effect was even more pronounced in the high – gamma example site than in the MUA example site (eta²_MUA = 0.07, eta²_LFP = 0.19, Fig. 2E and Supplementary Table 1). Finally, the example site shown in Fig. 2C (from array 2) displayed stronger neural responses to certain members of a particular shape type (e.g. ‘Fruits’ for shape type 6), which constituted another type of interaction between shape and category (p = 0.000), combined with a main effect of shape (p = 0.00002), but no significant effect of category (p = 0.46, Supplementary table 1). These interactions could be due to selectivity for the specific exemplar (e.g., the fruit for shape type 6 is a bunch of grapes), to subtle differences between the members of the same shape or category in their shape and category properties, or due to variations in other dimensions such as variations in contour or texture. Overall, these results suggest that while shape selectivity is a dominant feature of the visual responses in the sites of human occipitotemporal cortex that we sampled, interactions between shape and category were also observed in a subset of neural sites.

To illustrate the shape and category responses of all visually-responsive channels, Figs. 3A and B show an overview of the z-scored responses (see Methods) per array at the MUA and LFP level, respectively. We ordered the channels from top to bottom based on their selectivity as determined in the 2-way ANOVA with factors shape type and category: channels indicated by the blue bracket showed a main effect of shape type only, channels indicated by the yellow bracket showed a main effect of category only, and channels with the green bracket showed a significant shape type x category interaction (sometimes in combination with a main effect of shape type and/or category). The channels below the green bracket were visually-responsive but did not show any significant effect in the two-way ANOVA. The order of the columns (from left to right) was determined based on the average response of all visually-responsive channels across each array separately. The plots ordered according to shape type (left panels in Fig. 3A and B) clearly illustrate that our stimulus set evoked strong MUA and LFP responses on a large number of recording channels. Additionally, the stimulus selectivity was relatively broad for all arrays (FigS1) (median S_width MUA: s_array1 = 0.69, s_array2 = 0.62, s_array3 = 0.86, s_array4 = 0.7, median S_width LFP: s_array1 = 0.5, s_array2 = 0.52, s_array3 = 0.69, s_array4 = 0.52).

Visual inspection does not suggest a clear preference for specific shape types in any of the arrays. When plotting the responses according to category (right panels in Fig. 3A and B), the results were qualitatively similar, except for the category ‘animals’ in array 3, which clearly evoked strong responses to a subset of shape types belonging to this category, as illustrated in the example channels in Fig. 2B and 2D. To investigate the overall shape type or category preference for each array more quantitatively, we averaged the MUA and high – gamma responses across all visually-responsive channels (Fig. 4A). Arrays 1, 2 and 4 responded significantly less to shape types 7, 8 and 9 (which were characterized by a lower surface area and high aspect ratio), whereas for array 3, the MUA response to the category ‘animals’ was significantly higher compared to the other categories (Fig. 4A). The high-gamma responses ranked according to shape type (Fig. 3B left panel) appeared very similar to the MUA responses, which was supported by the significant correlations between MUA and high-gamma responses for all arrays (Fig. 4B). When plotted according to category, the high gamma responses of array 3 contained an even more pronounced preference for the category ‘animals’ than the MUA responses (Fig. 3B and eta² values in Fig S2B).

Further analysis of all individual visually-responsive electrodes (using two-way ANOVA with factors shape type and category) confirmed the high diversity of neural tuning for shape type and category. At the MUA level, the highest number of channels showed a significant interaction between shape type and category for all arrays (Fig. 3C). More specifically, out of the 237 visually responsive MUA sites, 39 sites (16%) were significantly selective for the shape type dimension alone, merely 8 sites (3%) showed a significant main effect of category alone, compared to 114 sites (48%) with interactions between shape type and category (chi2 = 143, p < 0.0001). At the LFP level, we also observed mainly shape type selectivity and shape-category interactions, although Array 1 and Array 2 showed more channels with a significant main effect of shape type (chi2 = 6.8, p < 0.0001). In two arrays, the proportion of significant shape type x category interactions was significantly higher in the MUA (27 and 63% for array 1 and 2, respectively) compared to the LFP responses (12 and 22% for array 1 and 2, respectively; array 3 had a similar proportion of interactions in MUA and LFP, and for array 4 the LFP signal was of low quality).

To test the effect sizes for shape type and category, we compared the eta² of all sites with significant effects (Fig. S2). Overall, the eta2 values for shape type were higher than for category in array 1, 2 and 4, and this difference in eta² was more pronounced for sites displaying a main effect of shape. Interestingly, in arrays 1, 2, and 4, even for channels with only a significant interaction or with both significant shape and category main effects, eta² was significantly stronger for shape type compared to category. However, this was not the case for the shape type x category interaction channels of array 3, where both shape and category effect sizes were similarly strong.

Dissimilarity analysis suggests that shape type is the dominant representation in all arrays

The average response across individual channels can exhibit weak category selectivity, but the categorical structure of the stimulus set may also appear in the pattern of activity distributed across the entire neuron population. ²⁰ Therefore, we investigated how information about shape type and category was represented in the multichannel activity patterns. Per pair of stimuli, we correlated the spatial multi-channel response pattern for each microarray (see Methods). The resulting dissimilarity matrices (1 – correlation, Fig. 5A) were correlated with behavioral dissimilarity matrices for the shape type and category dimensions as well as with the physical dissimilarity matrix based on the silhouettes (Fig. 5B) by means of Representational Similarity Analysis (RSA). ²⁶ For all microarrays, the multi-channel analysis revealed significant shape-based and silhouette representations in the MUA responses, but no significant correlation with the category matrix (Fig. 5C and Table 2). At the LFP level, we observed similar results for array 3 and 4 (Figure S3), but array 1 only correlated significantly with the silhouette dissimilarity matrix and array 2 only with the shape dissimilarity matrix (Table S2 and Fig S3). Thus, the multichannel response pattern of all 4 arrays in LOC was predominantly shape-type. Moreover, the neural (MUA) dissimilarity matrices correlated significantly with both the perceptual and the physical dissimilarities. Interestingly, these population-level analyses suggest no contribution of category similarity, while the aforementioned single-channel analyses revealed many sites with an interaction between shape and category tuning.

Table 2

**Results of Representational Similarity Analysis (RSA) conducted on the MUA neural dissimilarity matrices.** The following key measures are reported: Rho (Pearson Correlation): Rho represents the Pearson correlation coefficient, quantifying the similarity between the neural dissimilarity matrices and the behavioral dissimilarity matrices ; p: The p-value associated with the correlation coefficient, indicating the level of statistical significance.
ARRAYS	Category	Shape	Silhouette
1	Rho = 0.02, p = 0.27	Rho = 0.1, p = 0.00	Rho = 0.15, p = 0.00
2	Rho = 0.02, p = 0.27	Rho = 0.11, p = 0.00	Rho = 0.10, p = 0.00
3	Rho = 0.002, p = 0.45	Rho = 0.2, p = 0.00	Rho = 0.18, p = 0.00
4	Rho = 0.03, p = 0.16	Rho = 0.18, p = 0.00	Rho = 0.17, p = 0.00

Next, we visualized the representation of the stimuli in the neural spaces of each array using MDS on the dissimilarity values. The 2D solutions of the MDS are shown in Fig. 6. To evaluate the presence of clustering in each dimension, the stimuli were color coded according to shape type (top row of Fig. 6) and semantic category (bottom row of Fig. 6). As an additional step to verify the existence of shape and/or category clusters within each array, we applied agglomerative hierarchical cluster analysis (Fig.S5). Shape clustering was evident with both methods in arrays 1, 2, and 4, with aspect ratio as an important factor mainly in array 1 and 2, while the MDS solution color-coded based on category did not exhibit a clear clustering. Array 3, on the other hand, did not exhibit strong clustering for the shape dimension, but when color-coded according to category, three exemplars of the category "animals" (rabbit, owl, and fish) were clearly separated from the other stimuli (see Fig S4 for the LFP results, where a similar observation is made). The hierarchical cluster analysis corroborated this observation, since a subset of animal exemplars clustered together in the neural space of Array 3. Overall, these findings are consistent with the shape-based representations we found in the multivariate correlation analysis, but they also suggest the presence of some additional category information in array 3.

Linear decoders detect reliably both category and shape information

The MDS analysis offers a representation of the stimuli in a limited number of dimensions in the neural space of the recorded population, but a decoder can utilize all the multidimensional information in a population. Moreover, decoding can be performed over time, which can also give insight into the temporal dynamics of the neural responses. Therefore, we trained linear Support Vector Machines on the neural responses per array in 100 ms bins (sliding window of 50 ms), and tested on each time bin of individual trials whether we could correctly classify either the shape type or the category. Figure 7A illustrates the temporal evolution of the normalized decoding accuracy at the MUA level (as described in the Methods section) for the two decoders (shape type and category). The decoding accuracy was normalized by subtracting the chance level accuracy, where the chance level represents the expected accuracy by random chance. In all 4 arrays, we could reliably decode shape type starting as early as 75 ms after stimulus onset for array 1, compared to 100 ms for array 2, and 200 ms after stimulus onset for arrays 3 and 4 (Fig. 7A). Furthermore, and in line with the previous analyses, array 3 also showed significant classification of category information, which was predominantly restricted to the "animals" category (see confusion matrix in Fig. 7B). Remarkably however, despite the presence of primarily shape type representations on the other arrays, we also obtained significant classification of category on arrays 1, 2 and 4, which emerged almost simultaneously with the shape type classification. Thus, although neither individual channels nor the multichannel response pattern appeared to furnish any category information, a population of shape-selective neurons in human visual cortex contained reliable information about object category (Fig S6 for LFP decoding).

To further investigate the predominant association of category information with the "animals" category, we conducted additional analyses by removing the "animals" category and performing the decoding again (Fig. S7). The decoding accuracy for arrays 1 and 2 at both the MUA and LFP levels remained unaffected. However, a noticeable decline in both accuracy and significance was observed for array 3 at both the MUA and LFP level. These findings were consistent with the observations from the confusion matrices (Fig. 7B, S6B), emphasizing that the category information was predominantly restricted to the "animals" category for array 3.

Lastly, we assessed the generalization of the decoders over time (Fig. 7C). The shape and category decoders were trained using 100 ms time windows, and then tested on every 100 ms window that followed or preceded the training bin. Each window was then shifted by 50 ms. The decoding accuracy of array 2 generalized over the entire stimulus duration for both shape type and category, suggesting a very stationary population representation emerging early after stimulus onset, while arrays 1, 3 and 4 exhibited a more transient generalization of the classifier. At the high-gamma frequency range (as depicted in Fig. S6), we observed, on average, highly similar decoding performance, albeit with lower levels of accuracy.

We recorded selective MUA and LFP responses to images of objects on four microelectrode arrays in the human Lateral Occipital Complex. Both single-channel and multi-channel analyses revealed robust encoding of shape type and a very weak representation of category, consistent with previous electrophysiology studies in nonhuman primates. However, from each neuronal population, we could reliably classify semantic category using linear decoders, suggesting population-based category representations in LOC. Furthermore, single-channel analyses revealed that many channels showed interactions between the shape and category dimension, demonstrating the added value of single-channel information to reveal the tuning complexity underlying object processing in the human ventral visual stream.

While a large number of studies have been published on shape-sensitive cortex in humans using fMRI, electrophysiological data on the shape selectivity of human visual neurons remain scarce. Decramer et al. ²⁵ showed for the first time single-unit and LFP selectivity for images of objects and line drawings of objects (the LOC classic localizer) in lateral occipitotemporal cortex, including receptive field estimates (on average 22 deg diameter centered on the fovea) and selectivity for disparity-defined curved surfaces. A subsequent study ²⁷ reported robust face-selective responses at short latencies, which also occurred for feature-scrambled and face-like stimuli. In the same study, a few channels also showed body selectivity in close proximity to the face-selective channels. Compared to these two previous studies, we recorded from considerably larger populations of neurons across a more extensive part of the LOC, with a stimulus set in which the dimensions of shape type and category were orthogonalized. Our data confirm and clarify the abundant shape selectivity in this region, since on average 62% of the channels were visually responsive, while 67% of those were significantly stimulus-selective (for shape type and/or category). Note that the average 2D-shape selectivity index we found (0.72) was comparable to the ones reported in macaque area TE (0.65). ²⁸ The high incidence of shape selectivity is remarkable given that the use of multielectrode arrays precluded optimizing the stimulus to each recording site (e.g. position, size) and that each array only sampled from a 4 by 4 mm area of cortex. On the other hand, chronic multielectrode recordings of MUA (i.e. large and small action potentials) may furnish a more unbiased sampling of neuronal activity in the recording area, which is crucial for relating our findings with invasive recordings to fMRI results.

We used the same stimulus set and analyses as in the event-related fMRI study of Bracci et al., ²⁴ who reported a transition from shape to category-based representations along the posterior to anterior direction in the ventral visual stream. While the early visual areas provide a purely shape-based representation correlating with the physical similarities between the stimuli, and the higher-level areas (in prefrontal and parietal cortex) provide a more category-based representation, several intermediate regions in or near the LOC represented both shape- and category information. Here, we not only could confirm the fMRI results, but also clarify the underlying neuronal selectivity of these combined shape/category representations. We mainly observed significant interactions between shape type and category on individual channels of every array. These interactions occurred in two types. The first type of shape-category interactions were responses to a small number of exemplars of a single category, as in array 3. However, on the other arrays we found channels in which the shape type preference differed between the categories tested, most likely due to a selectivity for small shape or texture differences between the members of a given shape type. These interactions remain unnoticed in population-level analyses such as fMRI. Furthermore, the interactions were less prevalent with LFPs than with MUAs, suggesting that measurements of smaller populations of neurons are more likely to detect such interactions.

Array 3 demonstrated a clear preference for animal images compared to other objects. Considering this observation and its more dorsal positioning, it is highly likely that Array 3 was located within the region commonly referred to as LOTC - body in fMRI studies. The preference for animals on array 3 was the only category-like (i.e. responding to certain exemplars of one category) representation that was visible at the level of individual channels, whereas individual channels of all other arrays at most showed interactions of the category dimension with shape type. Intriguingly, even multi-channel analyses (dissimilarity analysis or hierarchical clustering) suggested that shape type was the dominant factor in every array. The lack of an explicit category representation (in arrays 1, 2 and 4) is entirely in line with a previous single-cell study in the macaque inferotemporal cortex.^19,29

In contrast, a linear SVM analysis could reliably extract category information from the population responses of every array. Conceptually, our decoding analysis was equivalent to Multivoxel Pattern Analysis (MVPA), ^30,31 with a limited number of responsive channels (spaced 400 micron apart) being equivalent to the visually-active voxels in the fMRI. Thus, in the high-dimensional space of our LOC arrays (with up to 94 responsive channels), we could extract category information even when no individual channel appeared to code these categories. These results are again in line with previous findings in macaque monkeys, showing that category information can be reliably (and to a similar level as in prefrontal cortex) decoded from the activity of a population of ITC neurons despite the lack of explicit category coding in individual neurons. ³²

Our findings provide evidence that both shape and category representations are present in the human Lateral Occipital cortex, with the SVM approach revealing category-level information which was not apparent using RSA, MDS, or hierarchical clustering for all arrays. Specifically, the RSA analysis demonstrated that the neural representations in the Lateral Occipital cortex were primarily driven by shape and low-level pixel-wise similarities, indicating that the neural responses were more sensitive to the shape of the stimuli. This discrepancy between methods may be due to the fact that the SVM is more sensitive to subtle differences in patterns of neural activity than these other techniques, allowing it to decode information that is not detectable through measures of representational similarity. These observations match well with the findings from the single-channel analyses, since many channels were tuned to both shape and category in an interactive manner. One such channel would not suffice to decode category, but multiple channels with different interactions would, in the same way as viewpoint-invariant recognition can be obtained by sampling multiple view-tuned neurons. ³³ Likewise, the SVM might use a combination of channels that show interactions between shape and category to make a reliable distinction between categories. In contrast, RSA can reveal the structure of the neural representations of stimuli, which can provide insight into how the brain processes and categorizes different types of information. Note however that a single 4 by 4 mm array samples neural activity from a small cortical region (equivalent to 4 fMRI voxels in most fMRI studies), which may at best represent a single category (such as ‘animals’ in array 3). In contrast, RSA is typically performed on a very large number of voxels or on behavioral ratings, which encompass all categories in the stimulus set. The limited spatial sampling area of an array may explain why we did not observe a significant correlation with the category dissimilarity matrix in array 3.

Together, these findings highlight the complexity of neural mechanisms underlying object processing and the importance of using multiple techniques to uncover these representations. While the population as a whole showed strong shape tuning and only very limited category selectivity, we found a large neuronal diversity and distinct interactions between shape and category at the single-channel level in human LOC. The broader relevance of this diversity in tuning was demonstrated by the ability of classifiers to decode not only shape but also category.

Data were collected from three patients (patient 1, 24-y-old male ; patient 2, 54-y-old woman ; patient 3, 58-y-old woman) with intracranial depth electrodes as part of their presurgical evaluation for drug – resistant focal epilepsy. Patient 2 was diagnosed with Neurofibromatosis type 1, without any intracranial tumors. At the age of 34, she suffered from a left occipital intracranial hemorrhage due to venous sinus thrombosis. Ethical approval was obtained for microelectrode recordings with the Utah array in patients with epilepsy (study number s53126). Study protocol s53126 was approved by the local ethical committee (Ethische Commissie Onderzoek UZ/KU Leuven) and was conducted in compliance with the principles of the Declaration of Helsinki, the principles of good clinical practice, and in accordance with all applicable regulatory requirements. All human data were encrypted and stored at the University Hospitals Leuven.

Patients:

Three patients were implanted with microelectrode arrays (Utah array) for research purposes to study the microscale dynamics of the epileptic network in the presurgical evaluation (“Microscale Dynamics of Epileptic Networks: Insights from Multiunit Activity analysis in neurosurgical patients with refractory epilepsy”, Bougou et al., EANS 2023, Barcelona). No additional incisions were made for the purpose of the study. Utah arrays were located in the occipital cortex adjacent to the clinical macroelectrodes, analogous to previous studies using micro-electrode arrays. ^{25,27, 34–36} Target locations of intracranial electrodes were determined by the epileptologist and based on electroclinical findings and non-invasive multimodal imaging.

In all three patients the array was deemed outside the presumed epileptogenic zone (PEZ) after analysis of the intracranial EEG.In patient 1, the array (array 1) was located below the lateral occipital sulcus (LOS), whereas the array was above LOS in patient 2 (array 2). In patient 3, one array was above (array 3) and the other below LOS (array 4) (MNI coordinates of the arrays are provided in Table 1).

Microelectrode recordings

We used 96 – channel microelectrode arrays (4 x 4 mm; electrode spacing of 400 microns; Blackrock Microsystems, UT) in all patients. The arrays were inserted with a pneumatic inserter wand (Blackrock Neurotech). Dura was closed above the array and the bone flap was placed on top to keep the array in place. Reference wires were placed subdural, ground wires epidural. The signal was digitally amplified by a Cereplex M head stage (Blackrock Neurotech), and recorded with a 128 – channel neural signal processor (NeuroPort system, Blackrock Neurotech, Salt Lake City, UT, USA). In each recording session, multi – unit activity (MUA) from all 96 channels was sampled at 30 kHz, and high-pass filtered above 750 Hz. The detection trigger of the MUA was set at the edge of the noise band. The LFP signals were recorded continuously with a sampling frequency of 1000 Hz. All patients stayed at the hospital for 14 days after implantation, but the data reported here was acquired in 1 recording session per array.

Stimulus presentation:

Experiments were performed in a dimmed hospital room. We presented stimuli on a 60 Hz DELL-P2418HZM LED monitor using custom-built software. The patients fixated a small red square (0.2 × 0.2°) appearing in the center of the display at a viewing distance of 60 cm (pixel size 0.026 deg). The left or right pupil position was continuously monitored using a dedicated eye tracker (Eyelink 1000 Plus, 1000 Hz) in head free mode. Breaking fixation from an electronically defined 3° by 3° fixation window resulted in trial abortion. The experiment was controlled using Presentation software (Neurobehavioral Systems, Berkeley, CA, USA). For data synchronization, we attached a photodiode to the left upper corner of the screen, detecting a white square that appeared simultaneously with the first frame of the stimulus; this ‘photocell’ was invisible to the patients. Patients performed either a passive fixation task (patient 1) or a variant of the same passive fixation with a distractor (patient 2, patient 3) (in which the patients were asked to press a button with their right hand whenever a distractor (red or green cross) appeared at the fixation point, randomly in approximately 2% of the trials).

Stimuli:

We first screened for visual responsiveness in the MUA using images of objects and line drawings of objects (LOC classic stimulus set) presented at the center of the screen and at several positions in both hemifields. For each channel, we quantified the strength of the response at the different stimulus positions. This allowed us to determine the optimal position in the visual field per channel. To account for the variability in the receptive fields of individual channels, we presented the stimuli at the fixation point. The fixation point included the average receptive field of the MUA for each array. Therefore, stimulus position was not optimized for each individual channel. This approach allowed us to capture a broader representation of the neural activity across the array.

LOC localizer – Classic

This stimulus set consisted of intact and scrambled grayscale images of objects and line drawings of objects ^4,25 (Fig. 1). After a fixation period of 300 ms, each stimulus was presented for 800 ms, 500 ms, and 250 ms for arrays 1, 2, and 3 & 4 respectively, followed by an interstimulus interval of 100 ms for arrays 1 and 2 and 150 ms for arrays 3 and 4.

LOC localizer – Naturalistic

This stimulus set consisted of intact and scrambled colored and grayscale naturalistic images (Fig. 1), which were presented for 500 ms followed by an interstimulus interval of 100 ms.

Shape - category stimuli

A stimulus set of 54 images in which shape and category were dissociated.²⁴ This stimulus set contained 6 object categories (minerals, animals, fruit/vegetables, musical instruments, sport articles and tools) where each category included 9 grayscale images with unique shape properties (shape type). Therefore, the category and shape dimensions were orthogonal since every category contained one stimulus from each of the nine shapes and every shape contained one stimulus from each of the six categories. After a fixation period of 300 ms, individual stimuli were presented for 800 ms (array 1) or 500 ms (arrays 2, 3, 4), followed by an interstimulus interval of 100 ms.

Data preprocessing

We analyzed all data using custom-written MATLAB R2020b (MathWorks, Natick, MA, USA) scripts and the EEGLAB toolbox. ³⁷

MUA

We calculated net average MUA responses (in 50 ms bins) by subtracting the baseline activity (-300 to 0 ms before stimulus onset) from the epoch (50–350 ms after stimulus onset) in each trial .

LFP

To remove line noise, data were filtered with a combined spectral and spatial filter ³⁸ which can eliminate artifacts while minimizing the deleterious effects on non-artifact components. A zero – phase Finite Impulse Response (FIR) bandpass filter between 2 Hz and 300 Hz was then applied to the data. Trials of which the broadband activity deviated more than twice the standard deviation were discarded. The LFP power was analyzed in the high – gamma band (60–120 Hz). For every trial, the time – frequency power spectrum was calculated using Morlet’s wavelet analysis ^39,40 with a resolution of 7 cycles. The first and last 100 ms of each trial were discarded to remove any filter artifacts. Power was normalized per trial by dividing the power per frequency by the power for this frequency averaged over time in the 300 ms baseline interval before stimulus onset.

Visually responsive sites:

We acquired at least 10 correct trials per stimulus (ranging from 10 to 19 trials). To detect visually responsive MUA channels in the shape-category test, we compared the average activity across time during the baseline period (– 300 to 0 ms before stimulus onset) with the average activity in a 200 ms interval after stimulus onset using a 1-way ANOVA. Because the response latency differed markedly between the four arrays, we chose different time intervals post stimulus onset for each array: array 1: 25–225 ms, array 2: 75–275 ms, array 3: 125–325 ms, array 4: 125–325 ms. Channels with a significant increase in activity (p – value lower than 0.05 divided by the number of channels to correct for multiple comparisons) were considered visually responsive. For the high – gamma responses, due to lower Signal to Noise Ratio, we performed the 1 – way Anova between the baseline and the post – stimulus interval only for the two most preferred conditions per channel. We determined the preferred condition for each channel, by averaging the post – stimulus per condition, sorting them in a descending order, and selecting the first two conditions with the strongest responses..

MUA normalization for LOC localizer:

For comparison with Decramer et al., ²⁵ the MUA responses to the LOC localizer stimuli were normalized according to their peak values. More specifically we first averaged the net responses across “intact” stimulus trials and found the peak value per channel. Then, the responses per channel for both “intact” and “scrambled” stimuli were divided by the corresponding peak value.

Z – score normalization for shape – category stimuli:

To visualize the MUA and high-gamma responses, we employed z-score normalization by averaging the MUA activity across the post-stimulus interval and across trials, i.e., for each channel and for each stimulus separately. Subsequently, we performed a per-channel normalization of these averaged responses such that the mean and standard deviation across the 54 different stimuli was 0 and 1, respectively. The MUA and high-gamma normalized responses were plotted (color – coded according to the z - score) following first the order of the mean responses for the shapes and then for the categories (orange square).

Statistics:

To assess the MUA and high – gamma selectivity for intact vs scrambled images in the LOC localizer stimuli for each array, we calculated one-way ANOVAs on the normalized MUA responses across all visually-responsive channels of each array. For the shape – category test, a 2 – way ANOVA with factors category and shape was performed per channel. For all factors that reached significance, we used Tukey’s test with 95% confidence interval to correct for multiple comparisons. To evaluate the size of the effects we calculated the eta².

Selectivity – index:

We calculated the selectivity index to evaluate how strongly each channel responds to a preferred stimulus compared to non – preferred stimuli. This measure provides a quantitative measure of the degree to which a channel is tuned to a specific stimulus. It is defined as: \((n-\sum {r}_{i}/ max)/(n-1)\), where \(n\) is the number of individual stimuli (54), \({r}_{i}\) is the mean net response of one channel to stimulus \(i\), and \(max\) is the largest mean net response. ^28,41

Behavioral and physical similarity:

We used the similarity judgements for the shape and category dimensions rated by a group of participants in Bracci et al. ²⁴ to construct shape and semantic category models by means of behavioral shape and category dissimilarity matrices. Additionally, similar to Bracci et al., ²⁴ and Op de Beeck et al. ⁴² pixelwise similarities among images were computed in order to construct the physical dissimilarity matrix and evaluate the image low – level shape properties / image silhouette.

Correlation multivariate analysis:

A correlation multivariate analysis was used to analyze whether the multichannel activity pattern per array was category-based or shape-based. ^24,43 For each visually responsive channel and each stimulus, the averaged net activity (\({r}_{i},\)at the MUA level) and the normalized gamma power (at the LFP level) across time after stimulus onset were extracted. The full dataset was then randomly divided into two random and non-overlapping subsets of trials; A and B, which was repeated in 100 iterations to get a measure of variability. For each iteration, the multichannel activity pattern associated with each stimulus in set A was correlated with all the multichannel activity patterns of each stimulus in the set B. Then, the resulting correlation coefficients for each stimulus-pair were averaged across iterations, in order to extract a 54 x 54 correlation matrix for each microarray. Finally, the resulting neural matrices were converted into dissimilarity matrices (1 – correlation) and were correlated with the behavioral dissimilarity matrices for the shape and category dimensions (Pearson r). As described in Op de Beeck et al., ⁴² permutation statistics were used to determine the significance of the entry-wise correlations between vectorized dissimilarity matrices across the corresponding entries of both vectors. Thus, we used a permutation test (n = 1000) to calculate the Spearman’s correlation coefficient between the neural dissimilarity matrices and the behavioral dissimilarity matrices for shape and semantic category (Representational Similarity Analysis – RSA). ²⁶ For comparison, we also correlated the neural dissimilarity matrices with the physical dissimilarity matrices.

Multidimensional Scaling (MDS):

MDS was used to visualize the neural similarity structure per array by reducing the multi-channel activity patterns corresponding to each stimulus into a lower – dimensional space, while preserving similarities or distances between them. We used the Matlab function “mdscale” which performs nonmetric multidimensional scaling by transforming monotonically all the dissimilarities in the matrix and approximating corresponding Euclidean distances between the output points. We evaluated the goodness of fit for 1 until 10 dimensions by measuring the difference between the observed dissimilarity matrix and the estimated one (stress value). We used the 2 – dimensional solution (even with poor goodness – of – fit) to visualize the level of similarity of individual stimuli.

Agglomerative hierarchical cluster analysis:

We used agglomerative cluster analysis on the neural dissimilarity matrices, to identify whether the neural responses to different shapes and/ or categories in each array cluster together in meaningful ways. This involved treating each observation as a separate cluster and iteratively merging clusters based on their similarity until the stopping criterion was met (maximum 10 clusters were allowed). The analysis was performed using the MATLAB function “linkage”, with the nearest distance default method.

Linear decoding:

To further investigate the multichannel responses we applied a linear Support Vector Machine (SVM) to classify sample vectors of which the entries consist of the per-channel net activity (at the MUA level) or the gamma power (at the LFP level) averaged over a time window of 100 ms. We focused on visually responsive channels (net multiunit activity (MUA) and normalized high gamma). To explore the dynamics of decoding accuracy, we applied a sliding window approach with a 100ms duration, shifting it in 50ms steps across the trial duration. Before training and testing the decoder, we performed z-score normalization on the data. The multiclass decoder was trained separately for each time – window, to find the hyperplane that separates the data from either the 9 individual shapes, or the 6 individual semantic categories. To prevent data leakage across trials, a cross-validation scheme was employed, dividing the dataset into 10 folds.⁴⁴ The training and testing phases were strictly independent, ensuring that the model's performance was evaluated on unseen data. Class labels of testing trials were excluded during training to ensure unbiased prediction. To assess the significance of the decoding accuracy, a paired t-test was performed, comparing the observed accuracy against the null hypothesis of random chance. We considered a decoding accuracy as significant if it exceeded the threshold of p < 0.05. To evaluate whether the SVM decoder generalized over time, we first allocate entire trials to the train and test set, we trained a decoder for each window shift and then tested on the activity across all other time windows for the duration of the whole trial.

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Acknowledgements:

We are indebted to all patients who participated in this study. We thank Stijn Verstraeten, and Anaïs Van Hoylandt for technical assistance.

This work was supported by Fonds Wetenschappelijk Onderzoek (FWO) grant G.0B6422N, KU Leuven grants C14/18/100 and C14/22/134, and HBP SGA3 945539. T.T. is supported by FWO (senior clinical researcher; FWO 1830717N).

Author contributions

H.O.D.B. conceived and designed the experiment. T.T. planned and performed arrays placement surgery. M.V. performed the recordings and was responsible for all clinical trial related activities. V.B performed the data analysis and wrote the manuscript. T.T. and P.J. supervised and guided the study. W.V.P. selected the patients, and performed the presurgical planning of placement of electrodes. All authors reviewed and edited the manuscript.

Declaration of Interests

The authors declare no competing interests.

Goodale, M. A. & Milner, A. D. Separate visual pathways for perception and action. Trends Neurosci. 15, 20–25 (1992).
Mishkin, M. & Ungerleider, L. G. Contribution of striate inputs to the visuospatial functions of parieto-preoccipital cortex in monkeys. Behav. Brain Res. 6, 57–77 (1982).
Malach, R. et al. Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. Proc. Natl. Acad. Sci. U. S. A. 92, 8135–8139 (1995).
Kourtzi, Z. & Kanwisher, N. Cortical regions involved in perceiving object shape. J. Neurosci. 20, 3310–3318 (2000).
Grill-Spector, K. et al. A Sequence of Object-Processing Stages Revealed by fMRI in the Human Occipital Lobe. Hum. Brain Mapp. 6, 316–328 (1998).
Grill-Spector, K., Kourtzi, Z. & Kanwisher, N. The lateral occipital complex and its role in object recognition. Vision Res. 41, 1409–1422 (2001).
Fisch, L. et al. Neural “Ignition”: Enhanced Activation Linked to Perceptual Awareness in Human Ventral Stream Visual Cortex. Neuron 64, 562–574 (2009).
Kanwisher, N., McDermott, J. & Chun, M. M. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311 (1997).
Downing, P. E., Jiang, Y., Shuman, M. & Kanwisher, N. A cortical area selective for visual processing of the human body. Science 293, 2470–2473 (2001).
Epstein, R. & Kanwisher, N. A cortical representation of the local visual environment. Nature 392, 598–601 (1998).
Bracci, S., Ietswaart, M., Peelen, M. V. & Cavina-Pratesi, C. Dissociable neural responses to hands and non-hand body parts in human left extrastriate visual cortex. J. Neurophysiol. 103, 3389–3397 (2010).
Cohen, L. et al. Language-specific tuning of visual cortex? Functional properties of the Visual Word Form Area. Brain 125, 1054–1069 (2002).
Khosla, M., Ratan Murty, N. A. & Kanwisher, N. A highly selective response to food in human visual cortex revealed by hypothesis-free voxel decomposition. Curr. Biol. 32, 4159–4171.e9 (2022).
Jain, N. et al. Selectivity for food in human ventral visual cortex. Commun. Biol. 2023 61 6, 1–14 (2023).
Kourtzi, Z. & Kanwisher, N. Representation of perceived object shape by the human lateral occipital complex. Science (80-.). 293, 1506–1509 (2001).
Sawamura, H., Orban, G. A. & Vogels, R. Selectivity of Neuronal Adaptation Does Not Match Response Selectivity: A Single-Cell Study of the fMRI Adaptation Paradigm. Neuron 49, 307–318 (2006).
Dubois, J., de Berker, A. O. & Tsao, D. Y. Single-Unit Recordings in the Macaque Face Patch System Reveal Limitations of fMRI MVPA. J. Neurosci. 35, 2791–2802 (2015).
Kriegeskorte, N. et al. Matching Categorical Object Representations in Inferior Temporal Cortex of Man and Monkey. Neuron 60, 1126–1141 (2008).
Vogels, R. Effect of image scrambling on inferior temporal cortical responses. Neuroreport 10, 1811–1816 (1999).
Kiani, R., Esteky, H., Mirpour, K. & Tanaka, K. Object category structure in response patterns of neuronal population in monkey inferior temporal cortex. J. Neurophysiol. 97, 4296–4309 (2007).
Freedman, D. J., Riesenhuber, M., Poggio, T. & Miller, E. K. Categorical representation of visual stimuli in the primate prefrontal cortex. Science (80-.). 291, 312–316 (2001).
Tsao, D. Y., Freiwald, W. A., Tootell, R. B. H. & Livingstone, M. S. A cortical region consisting entirely of face-selective cells. Science (80-.). 311, 670–674 (2006).
Bao, P. & Tsao, D. Y. Representation of multiple objects in macaque category-selective areas. Nat. Commun. 2018 91 9, 1–16 (2018).
Bracci, S. & Op de Beeck, H. Dissociations and Associations between Shape and Category Representations in the Two Visual Pathways. J. Neurosci. 36, 432–444 (2016).
Decramer, T. et al. Single-cell selectivity and functional architecture of human lateral occipital complex. PLoS Biol. 17, (2019).
Kriegeskorte, N., Mur, M. & Bandettini, P. Representational similarity analysis - connecting the branches of systems neuroscience. Front. Syst. Neurosci. 2, (2008).
Decramer, T. et al. Single-Unit Recordings Reveal the Selectivity of a Human Face Area. J. Neurosci. 41, 9340–9349 (2021).
Janssen, P., Vogels, R. & Orban, G. A. Selectivity for 3D shape that reveals distinct areas within macaque inferior temporal cortex. Science (80-.). 288, 2054–2056 (2000).
Freedman, D. J., Riesenhuber, M., Poggio, T. & Miller, E. K. A Comparison of Primate Prefrontal and Inferior Temporal Cortices during Visual Categorization. J. Neurosci. 23, 5235–5246 (2003).
Hernández-Pérez, R. et al. Tactile object categories can be decoded from the parietal and lateral-occipital cortices. Neuroscience 352, 226–235 (2017).
Darcy, N., Sterzer, P. & Hesselmann, G. Category-selective processing in the two visual pathways as a function of stimulus degradation by noise. (2019) doi:10.1016/j.neuroimage.2018.12.036.
Meyers, E. M., Freedman, D. J., Kreiman, G., Miller, E. K. & Poggio, T. Dynamic population coding of category information in inferior temporal and prefrontal cortex. J. Neurophysiol. 100, 1407–1419 (2008).
Logothetis, N. K., Pauls, J. & Poggiot, T. Shape representation in the inferior temporal cortex of monkeys.
Martinet, L. E. et al. Human seizures couple across spatial scales through travelling wave dynamics. Nat. Commun. 8, (2017).
Truccolo, W. et al. Single-neuron dynamics in human focal epilepsy. Nat. Neurosci. 14, 635–643 (2011).
Smith, E. H. et al. The ictal wavefront is the spatiotemporal source of discharges during spontaneous human seizures. Nat. Commun. 7, (2016).
Delorme, A. & Makeig, S. EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134, 9–21 (2004).
de Cheveigné, A. ZapLine: A simple and effective method to remove power line artifacts. Neuroimage 207, (2020).
Tallon-Baudry, C. & Bertrand, O. Oscillatory gamma activity in humans and its role in object representation. Trends Cogn. Sci. 3, 151–162 (1999).
Kronland-Martinet, R., Morlet, J. & Grossmann, A. Analysis of Sound Patterns through Wavelet transforms. Int. J. Pattern Recognit. Artif. Intell. 01, 273–302 (1987).
Rainer, G., Asaad, W. F. & Miller, E. K. Selective representation of relevant information by neurons in the primate prefrontal cortex. Nat. 1998 3936685 393, 577–579 (1998).
Op De Beeck, H. P., Torfs, K. & Wagemans, J. Behavioral/Systems/Cognitive Perceived Shape Similarity among Unfamiliar Objects and the Organization of the Human Object Vision Pathway. (2008) doi:10.1523/JNEUROSCI.2511-08.2008.
Haxby, J. V. et al. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science (80-.). 293, 2425–2430 (2001).
Kamitani, Y. & Tong, F. Decoding the visual and subjective contents of the human brain. (2005) doi:10.1038/nn1444.

There is NO Competing Interest.

NCOMMS2328865rs.pdf
Reporting Summary
SupplementaryMaterial.pdf

Download PDF

Journal Publication

published 30 May, 2024

Read the published version in Nature Communications →

Version 1

posted

You are reading this latest preprint version

Neuronal tuning and population representations of shape and category in human visual cortex.

Status:

Journal Publication

Version 1

Abstract

Figures

INTRODUCTION

RESULTS

Single – channel responses reveal tuning complexity

Dissimilarity analysis suggests that shape type is the dominant representation in all arrays

Linear decoders detect reliably both category and shape information

DISCUSSION

METHODS

Patients:

Microelectrode recordings

Stimulus presentation:

Stimuli:

Data preprocessing

Visually responsive sites:

MUA normalization for LOC localizer:

Z – score normalization for shape – category stimuli:

Statistics:

Selectivity – index:

Behavioral and physical similarity:

Correlation multivariate analysis:

Multidimensional Scaling (MDS):

Agglomerative hierarchical cluster analysis:

Linear decoding:

Declarations

References

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1