Shared and Independent Neural Representation Between Visual Perception and Mental Imagery

Visual mental imagery and visual perception have been shown to share a hierarchical topological visual structure of neural representation. Meanwhile, many studies have reported a dissociation of neural substrate between mental imagery and perception in function and structure. However, we have limited knowledge about how the visual hierarchical cortex involved into internally generated mental imagery and perception with visual input. Here we used a dataset from previous fMRI research (Horikawa & Kamitani, 2017), which included a visual perception and an imagery experiment with human participants. We trained two types of voxel-wise encoding models, based on Gabor features and activity patterns of high visual areas, to predict activity in the early visual cortex (EVC, i.e., V1, V2, V3) during perception, and then evaluated the performance of these models during mental imagery. Our results showed that during perception and imagery, activities in the EVC could be independently predicted by the Gabor features and activity of high visual areas via encoding models, which suggested that perception and imagery might share neural representation in the EVC. We further found that there existed a Gabor-specific and a non-Gabor-specific neural response pattern to stimuli in the EVC, which were shared by perception and imagery. These findings provide insight into mechanisms of how visual perception and imagery shared representation in the EVC. person was seeing or imaging object image, the brain visual areas were processing neural information. We used the Gabor wavelet pyramid to filter images to get low visual features and then combined the voxel activity pattern in the early visual cortex (V1, V2, and V3) to train voxel-wise Gabor-encoding models. At the same time, we trained the analogical voxel-wise encoding models with neural connectivity between the early visual areas and the high visual areas (FFA, LOC, and PPA). Moreover, we further investigated the neural representation during perception and imagery according to the multi-encoding models we constructed. were represented independently in the hierarchy of the visual system during visual perception and mental imagery. These observations provided new insights into the underlying neural subtract between perception and imagery.

"Whilst part of what we perceive comes through our senses from the object before us, another part (and it may be the larger part) always comes out of our own head".

Introduction
Every day, we are bombarded with plenty of visual stimulation, such as colours, textures, objects, and our brain then selectively processes the information to generate visual experiences. Therefore, visual perception is a kind of reflection of the interaction between feedforward sensory input, externally driven by "bottom-up" pathways, and feedback signals, internally generated from "top-down" pathways (Hsieh et al., 2010;Kastner et al., 1998). Another form of mental process that can generate internally visual experience is mental imagery, which refers to generation and representation of a visual image without corresponding feedforward stimuli from the real world (Andersson et al., 2019;Kosslyn, 1996;Kosslyn & Thompson, 2003).
What is the relationship between visual perception and mental imagery generating visual experiences within our brain? To address this question, much empirical evidence from behavioral and neural aspects supports that visual perception and mental imagery have similar functions during sensory processing. For example, in a series of behavioral experiments, participants could reach similar performance between scanning mental images and visual perception (Borst & Kosslyn, 2008). Some researchers have proposed that mental imagery resembles weak perception (Pearson, 2019;Pearson & Kosslyn, 2015), based on the similar neural mechanism of sensory processing between them (Borst & Kosslyn, 2008;Cichy et al., 2012;Dijkstra et al., 2019;Ishai & Sagi, 1995;Maier et al., 2020;Reddy et al., 2010;Stokes et al., 2009;Xie et al., 2020), and according to perception anticipation theory (Aitken et al., 2020;Kok et al., 2012;Kosslyn & Thompson, 2003;Sohoglu et al., 2012) it may have some impact on the top-down modulation process of perception (Berger & Ehrsson, 2013;Diekhof et al., 2011;Pearson et al., 2008). Specifically, some functional magnetic resonance imaging (fMRI) studies have provided evidence that mental imagery and visual perception shared brain activity pattern in the early visual cortex (EVC, i.e., V1, V2, and V3) encoding low-level visual features (Kosslyn & Thompson, 2003;Maier et al., 2020) as well as the high visual areas encoding category information (S. H. Lee et al., 2012;O'Craven & Kanwisher, 2000;Stokes et al., 2009).
Recently, advances in machine learning approaches have been used in combination with fMRI data to provide a novel viewpoint for exploring the specific shared neural representation between visual perception and mental imagery (Albers et al., 2013;Horikawa & Kamitani, 2017;S. H. Lee et al., 2012;Pearson et al., 2008;Reddy et al., 2010). Gallant and colleagues firstly decoded visual perception with Gabor features (i.e., spatial frequency, orientation, position and phase) of natural images in the EVC (Kay et al., 2008), and after that they conducted the same decoding analysis to the content of imagining specific famous pictures (Naselaris et al., 2015), which further provided insight into the similarity between visual perception and mental imagery in the EVC. More recently, Horikawa and Kamitani (2017) used features extracted by deep convolutional neural network (CNN) to decode brain activity in the ventral visual stream when participants were seeing and imagining natural images, which considered to support the claim that visual perception and mental imagery share neural representation in the hierarchy of visual system. Overall, multiple studies have collectively shown that neural representation of the ventral visual cortex during perception has a hierarchical topological structure, where high visual activity represents the semantic category of natural images (Naselaris et al., 2009) and the early visual activity represents low-level visual features (Albers et al., 2013;Cichy et al., 2012;S. H. Lee et al., 2012). The hierarchical topological structure was also observed during mental imagery (Reddy et al., 2010). Despite this, it should be attracted great attention that many previous studies have also reported the existence of a dissociation of function and structure of neural substrate between perception and mental imagery (Butter et al., 1997;Sirigu & Duhamel, 2001). However, it is still unclear how internally generated visual experience interacts with the externally visual stimuli to produce visual awareness during perception.
It is possible to consider that category information and visual features of a natural image may induce a sense of continuity, and thus the corresponding brain activities from high to the EVC might constitute a neural "signature" of the natural image. In the present study, we attempted to elaborate the internal structure of the neural "signature" during perception and mental imagery. To this end, we obtained the fMRI data from previous human research involving visual perception and mental imagery (Horikawa & Kamitani, 2017) and then trained two types of fMRI-based voxel-wise encoding models, i.e., an exogenous encoding model based on Gabor features of input stimuli, and an endogenous encoding model based on the neural connectivity between regions in the high visual areas (i.e., FFA, PPA, and LOC) and in the EVC (Fig 1.1, see more details in Methods section). Based on this, we further generalized these trained voxel-wise encoding models from perception to mental imagery. Next, we investigated the neural relationship between perception and imagery via combination and separation of the predicted neural representation pattens in the EVC with the two types of encoding models. Finally, we divided the voxels in the EVC into two groups according to the specificity and non-specificity of Gabor features, to refine the neural substrate between perception and imagery. 1 Neural encoding through Gabor wavelet pyramid and brain areas' connectivity model. When a person was seeing or imaging object image, the brain visual areas were processing neural information. We used the Gabor wavelet pyramid to filter images to get low visual features and then combined the voxel activity pattern in the early visual cortex (V1, V2, and V3) to train voxel-wise Gabor-encoding models. At the same time, we trained the analogical voxel-wise encoding models with neural connectivity between the early visual areas and the high visual areas (FFA, LOC, and PPA). Moreover, we further investigated the neural representation during perception and imagery according to the multi-encoding models we constructed.

Data acquisition
The fMRI datasets were provided by Kamitani Lab at Kyoto University and ATR (https://github.com/KamitaniLab). Here, we gave a brief description of stimuli, datasets, and experimental design.
There were 1250 natural images (j = 1, 2, 3, 4…, 1250), which were selected from 200 representative object categories (c = 1,2,3…,200), used in the whole study. Five healthy human subjects (four males) joined in the whole experiments. During BOLD-fMRI scanning, each subject was instructed to view natural images at the center of the projection screen (visual perception experiment, two sessions) and to imagine corresponding objects according to the word cues presented on the screen (visual imagery experiment, one test session). There was a training image session and a test image session in the perception experiment with the same procedures. In each session, there were 24 and 35 separate runs, each of which had a 9 min 54 s duration, and consisted of 50 trials with different images and 5 randomly interspersed repetition trials. These repetition trials were used to keep subjects' attention and subjects were instructed to perform a one-back task and to press a button if trial was repeated. Each image was presented at the center of the screen with 12 × 12 degrees of visual angle and flashed at 2 Hz for 9 s. In sum, 1200 different images (j = 1, 2, 3…, 1200) from 150 object categories (8 images used in each category, c = 1, 2, 3…, 150) were presented only once in the training image session. 50 different images (j = 1201, 1202…, 1250) from new 50 object categories (c = 151, 152…, 200) were presented 35 times each in the test image session. In the imagery experiment, each subject was instructed to read the red word cue presented on the screen and then visually imagine objects with closed eyes after hearing a beep sound. The word cues were names of 50 object categories (c = 151, 152…, 200) that were matched to the images presented in the perception test session, but subjects were asked to freely imagine objects from the same category as many as they could during each trial. The imagery session consisted of 20 runs, and each contained 25 imagery trials and each duration was 10 min 39 s.
The detailed procedures of MRI data preprocessing and functional brain region localization can be found in the original paper (Horikawa & Kamitani, 2017). The previous study was supported by the Ethics Committee of ATR, and the data re-analysis was approved by authors of the original paper.

Gabor feature
We used Gabor features extracted from the 1250 natural images by a Gabor wavelet pyramid (GWP) model to encode the activity in the EVC (Jones & Palmer, 1987;Ringach, 2002). Prior studies demonstrated that the GWP model could be viewed as an appropriate method to describe voxel activity in the EVC (Daugman, 1985;Jones & Palmer, 1987;T. S. Lee, 1996;Rainer et al., 2001). The GWP model has been used to describe the dimensions of space (Deyoe et al., 1996;Engel et al., 1994;Sereno et al., 1995), orientation (Haynes & Rees, 2005;Kamitani & Tong, 2005;Sasaki et al., 2006), and spatial frequency (Olman et al., 2004;Singh et al., 2000) of natural images. According to a prior study (Kay et al., 2008), these Gabor features extracted from natural images by a GWP model could be fitted to voxel responses in the EVC, especially in the primary visual cortex (V1).
We then applied the constructed GWP model to extract Gabor features of the stimuli images. The original 500 px × 500 px images were down-sampled to 128 px × 128 px image resolution for the analysis. Accordingly, the features were defined as where the f was an F × 1 vector containing the features (F = 10920, the number of filters used for the model), and W denoted a matrix of complex Gabor filters. The variable s corresponds to the matrix of each image used in the whole experiment, W had as many rows as the pixels in s, and each column contained a different Gabor filter; thus, its dimension was 128 2 × 10920. The features were the log of the magnitudes derived from filtering the image by each filter. These parameters corresponded to those used by Gallant and colleagues (Kay et al., 2008;Naselaris et al., 2015).

Encoding models
The responses in the EVC when people see something could be explained by two sources, namely from the external stimulus and internally generated signal from high visual areas (i.e., FFA, PPA, and LOC). To simulate the two-source information processing, we constructed two types of voxel-wise encoding models, from the Gabor features aspect (exogenous encoding model) and the aspect of neural connectivity between the EVC and high visual areas (endogenous encoding model), to encode the neural activity of each voxel in the EVC.
For the exogenous encoding model, we extracted the BOLD-fMRI signals of each voxel in the EVC related to the 1200 training images (j = 1, 2, …, 1200) during perception. For each voxel, we regarded the Gabor features as the input variable and the fMRI signal activity as the output response to train the voxel-wise encoding models.
According to the prior study, p was defined as the number of training images and q as the number of input channels. The response of each voxel in the EVC could be modeled as, Where y is the set of output responses (i.e., the response of each voxel in the EVC) with the dimension of (p × 1), X is the set of input channels (p × q), h is the kernel (q × 1), c is the DC offset (1 × 1), 1 is the vector of constant ones (p × 1), and n is the noise (p × 1). We used the functions from STRFlab toolbox (Version 1.45, http://strflab.berkeley.edu/) to automatically estimate the model parameters. The model parameters were estimated with gradient descent based on the early stopping algorithm to prevent parameters from over-fitting, and the stopping set consisted of 20% randomly selected responses (Kay et al., 2008;Tugnait, 1994). There was a bootstrap sampling approach for iterative analysis. This procedure was conducted independently for each voxel in the EVC; thus, we finally obtained an encoding model for each voxel in V1, V2, and V3 during perception.
To train the endogenous encoding model, we employed the voxel activity in each high visual area as the input features (each voxel as a feature) to replace the Gabor features in the exogenous encoding model and the fMRI signal activity as the output. The rest of the operations were consistent with those above used in training the exogenous encoding model.

Image identification analysis
The exogenous and endogenous encoding models trained with the perception training dataset were used in the identification of images, based on brain activities from the testing dataset during perception and mental imagery, respectively.
In the viewed image identification analysis (perception), we used the test session dataset of 50 new viewed images (j = 1201, 1202 …, 1250) to estimate the performance of the trained exogenous and endogenous encoding models, respectively. For the exogenous encoding model, we extracted Gabor features of each test image, and then put these low-level visual features into the model to predict the corresponding voxel response. In this way, each voxel-wise encoding model would produce a prediction value for each voxel in the EVC. Each region of the EVC (i.e., V1, V2, and V3) was viewed as a basic unit, and thus the predicted brain activity pattern in V1, V2, and V3 could be used to compare with the real fMRI signal pattern to estimate the performance of the encoding model. To this end, a Pearson's correlation coefficient between each pair of the two sets (predicted set and real set) of the test images (n = 50) was conducted and then a 50-by-50 correlation matrix for each subject was obtained.
If the diagonal value of the obtained matrix was the maximum in each column, it suggested that the predicted voxel activity pattern matched very well with the real one for the same image, and then we regarded it as the correct prediction. Finally, we calculated the ratio of the number of correct predictions to the total number of test images (correct predictions/50), which was regarded as predictive accuracy.
We used the same procedure to calculate accuracy with the trained endogenous encoding models. The only difference was input features. We regarded the brain activities of FFA, PPA and LOC as the input features and then predicted brain activity of each region of the EVC. At last, we compared the predicted responses with the real one to get prediction accuracies.
In the imagined image identification analysis (mental imagery), there was no stimulus input during mental imagery, so we adopted the same Gabor features of the viewed images to estimate the performance of the exogenous encoding model. We then compared the predicted activity in the EVC with the real one of the imagery test data. The specific calculation procedures of prediction accuracy were consistent with those described above in the perception experiment. The prediction of the endogenous encoding models in the imagery experiment were conducted in the same way to obtain corresponding accuracies.

Linear combination with two sources representations
The neural representation the EVC could be explained by two sources, stimuli relevance neural representation and corresponding inner consciousness representation. We, therefore, applied two approaches to explore the co-action of the two source representations in the EVC. Take V1 for example, the trained exogenous and endogenous encoding models could separately provide a predicted value for each voxel, and the values of these voxels in V1 could form a brain activity pattern; thus, we could obtain two types of predicted brain activity patterns corresponding to the types of encoding models. Namely, the linear combination was conducted by calculating the arithmetic mean of the predicted neural activity patterns from two types of voxel-wise encoding models. In addition, the nonlinear combination was obtained by calculating the geometric mean of the predicted neural activity patterns from two types of encoding models. The combined V1 activity pattern was further compared with the real V1 fMRI signal pattern to show the combined effect of the two types of encoding models. With the same approaches, we evaluated the combined effect within V2 and V3, respectively.

Separation of imagery from perception
Imagery is considered as a weak perception (Pearson & Kosslyn, 2015), which means that mental imagery shares a large part with visual perception. Based on this view, we further refined the relationship of the two mental processes in terms of the neural connectivity between high visual areas and the EVC. We repeated prediction procedures of the image identification analysis of the trained endogenous encoding model, and then regarded the predicted imagery activity pattern in the EVC as a covariate of the predicted perception activity pattern to remove.
Finally, the remaining predicted activity pattern in perception was compared with the real fMRI signal pattern in the EVC to make a prediction. To illustrate the effect of this separation, we conducted another two prediction performances to make a comparison. One was the scattered predicted activity pattern of imagery and the other was a fake activity pattern consisting of random white noise. We regarded them as covariates to remove from the predicted perception pattern and then calculated prediction accuracies, respectively.

Predictions of Gabor specificity and non-Gabor specificity
To further elaborate the represented information content in the EVC, we labelled the voxels in each region of the EVC with two opposite names. The names were defined by the weight values that obtained from the process of training exogenous encoding model: Gabor specificity which referred to the voxels with useful (non-zero) weight values, non-Gabor specificity which referred to the voxels with zero weight value. Here, the useful or zero weight value denoted whether the voxels could encode the Gabor features or not. Accordingly, we divided the voxels in each region of the EVC into two groups. After that, we conducted the same prediction procedures via the endogenous encoding model for each group in V1, V2, and V3, respectively.

Code and data accessibility
The code for generating the Gabor wavelet pyramid is freely available online
In the imagery experiment, subjects were required to imagine freely as many object images matched with the word cue as possible (Horikawa & Kamitani, 2017), but we conducted the prediction performance based on a specific image that was the same as the procedures in perception. Thus, there was no ground truth that the Gabor features of a specific image could successfully encode the brain activity triggered by the whole of object categories.
To further explore brain activity of imagery in the EVC that were explained by Gabor features, we extracted 1000 voxels in the whole EVC. At first, the values of Pearson's correlation coefficients between the predicted voxel activity and the real one of each subject were sorted in descending order and then we took the top 1000 values (except for the fifth subject who had only around 700 voxels conforming to the condition). Next, we depicted their spatial distribution pattern in the EVC, in which the Pearson's correlation coefficients were regarded as weights of nodes via the BrainNetViewer toolbox (https://www.nitrc.org/projects/bnv/). All above calculations were done both with perception and imagery data. We found that overlap distributions of the extracted voxels between perception and mental imagery were very large: 96% of subject 1, 95% of subject 2, 76.3% of subject 3, 84.5% of subject 4, 100% of subject 5 (the number of corresponding voxels in the EVC was 734), which suggested that spatial distributions were similar between the two mental processes for each subject. The median of the average weight of subjects during perception was 0.28 (ranging from -0.09 to 0.81), and the median during imagery was 0.02 (ranging from -0.21 to 0.44), which determined the size of the colour nodes in Fig 3.1b.

Visual cortex internal neural correlation during perception and imagery
From the prediction accuracies of the trained endogenous encoding models, we found that the brain activity patterns in the three high visual areas were able to successfully predict the brain activity pattern in the EVC (Fig   3.1a). Especially, FFA and LOC both obtained higher prediction accuracies more than 40% of V1 during perception (more than 10% during imagery). Critically, the endogenous encoding models were trained only with the perception training data, which could be generalized to predict the brain activity under the mental imagery condition (e.g., the average accuracies of FFA predictions were 12% ± 0.08 ( " (1, = 50) = 3.5714, = 0.0588) in V1, 21% ± 0.09 ( " (1, = 50) = 7.8478, = 0.0051) in V2, and 26% ± 0.10 ( " (1, = 50) = 10.2860, = 0.0013) in V3). Moreover, the tendency of prediction accuracies showed a gradient ascent of the EVC's prediction performance where higher-level visual areas predicted V1 with the lowest accuracy and V3 with the highest one, of which imagery prediction was the same as perception prediction (i.e., V1<V2<V3). This therefore provided further evidence for existence of a gradient hierarchical structure in the EVC, which was shared by perception and imagery (Fig 3.1a).

Dual representations from Gabor features and cortical internal correlations
We explored the combined effect of the exogenous and endogenous encoding models for explaining brain activity in the EVC during perception and mental imagery. For perception, results showed that linear combination of the two types of encoding models could effectively improve the prediction power, the average accuracies were increased by more than 15%. But the average accuracies were mostly reduced to lower than 10% under the nonlinear combination condition (the green lines in Fig 3.2a). There was not any evident linear or nonlinear combination effect in the mental imagery experiment (the red lines in Fig 3.2a).
When removing the neural activity of mental imagery from that of perception with the help of the two types of encoding models, we found that the average prediction accuracies fell to a range from 10% to 50%, which were still more significant than chance level (2%). However, after we removed the scattered imagery pattern and random white noise from perception part (Fig 3.2b), the prediction accuracies of the three high visual areas predicting the EVC were close to the original prediction performances of inner cortical connectivity. So, these observations showed that imagery shared plenty of information representation with perception in a fashion of visual neural connectivity.

Shared and independent representation in the EVC
To further elaborate the information content of the representation of voxel activity in the EVC, we divided these voxels into two groups with Gabor specificity and with non-Gabor specificity (for more details, see the Methods). We then tested performance of the exogenous and endogenous encoding models specially related to the two groups of voxels.
Interestingly, during the procedure of labelling the voxels in the EVC, we found that there was a stable topological distribution of the rates between Gabor specificity and non-Gabor specificity. Because whether the total number of selected voxels was 1000, 800, 600, 400, or 200, the rate of the two labelled groups were almost to 1:2 (see the Supplement Fig 3.4).
Comparing the prediction performances of the two groups, we found that accuracies obtained via the endogenous encoding model were almost equal in each area of the EVC, both in the perception and the mental imagery conditions. Besides, the prediction performances of the two groups were similar to the original performance conducted by the endogenous encoding model. Here, to show the results clearly, we selected the results of Subject 3 in Fig 3.3 (for other subjects' results, see Supplemental Fig 3.3.1, 3.3.2, 3.3.3, and 3.3.4).
Likewise, we explored the combined effect of the two types of the encoding models related to the two groups of voxels, respectively. The results showed that there were similar prediction performances between the two groups of voxels within a hierarchical fashion, which further proved the parallel representation contents in the EVC. GP, the Gabor specificity prediction during perception; NP, the non-Gabor specificity prediction during perception; OP, the original high visual areas predictions during perception; the red one that I represented the same prediction during imagery.
The bar in the right lower part showed the superimposed effect of three high visual areas, which compared the original superimposed effect with the combination of high visual areas prediction based on the Gabor non-specificity voxels and Gabor features prediction in the EVC.

Discussion
Using the trained exogenous and endogenous encoding models, we quantified the elaboration of a visual neural "signature" and its connectivity to visual features, which would provide an avenue to explore how internally generated representations and externally stimulated representations interact within our brain. Especially, the application of voxel-wise encoding models allowed us to understand the encoded information content related to each voxel within the EVC. Accordingly, we found the existence of a shared and independent neural representation between perception and visual imagery, which helps us to understand the neural subtract between the two similar mental processes.
Prior to this current work, previous research has applied MVPC to investigate the relationship between perception and imagery (Albers et al., 2013;Harrison & Tong, 2009;S. H. Lee et al., 2012;Tong, 2013) and to show the complexity of interaction within the visual cortex. Because MVPC are limited in their ability to decompose brain activity patterns into distinct sources of variation, how to measure the represent content is still largely unclear. This study used voxel-wise encoding models (Naselaris et al., 2015;Thirion et al., 2006) to refine the shared neural representation between visual perception and mental imagery. The result provided excellent prediction performance of the internal neural correlation with high visual areas (especially in FFA and LOC) both in perception and imagery conditions. Besides, we found a linear combination effect of interaction between visual input and internal neural association between high level visual areas and EVC, which was consistent with a previous finding (Zhang et al., 2014). Furthermore, we removed the imagery activity, in a fashion of covariate, from the internal activation part of perception to make an in-depth exploration of the representation mechanism of perception. The result showed that internal neural association in the ventral visual pathway during perception could be partly explained by that of mental imagery. More importantly, we could easily divide the voxels in the EVC into the Gabor specificity group and non-Gabor specificity group based on the method of voxel-wise encoding model. These results indicated that the shared and independent.
Combining with the voxel-wise encoding models, we found perceptual representation in a fashion of linear rather than nonlinear between Gabor features and cortical internal connectivity in the EVC, which was different from the modular representation of the brain (Op de Beeck et al., 2008). In addition, the majority of representation in the EVC could be explained by the internal neural connectivity, top-down expectation, which was consistent with assertions that the cortical feedback is an essential part of brain processing (Morgan, 2018). The event of the linear representation could be regarded as independent representations of the two sources of information, i.e., stimulus relevant information and internally generated information, which might provide a new evidence for the "conscious copy" model of working memory (Jacobs & Silvanto, 2015). The same prediction performances between Gabor specificity and non-Gabor specificity voxels via endogenous encoding model provided effective evidence for independent representation. The analysis based on voxel-wise level and associated the EVC with high level visual areas provides us a viewpoint to refine the shared neural representation mechanism between visual perception and mental imagery.
Additionally, Albert and colleagues found empirical evidence to support the common internal representation between imagery and working memory (Albers et al., 2013). In history, there has been a longstanding debate about whether the content of working memory can be regarded as its conscious experience (Atkinson & Shiffrin, 1968;Baars, 2002;Baars & Franklin, 2003;Cowan, 1988;Oberauer, 2002Oberauer, , 2009). The classical working memory model proposed that all the contents of working memory could be accessible to consciousness (A. Baddeley, 1992; A. D. Baddeley & Hitch, 1974). However, subsequent studies showed that the contents of working memory accessible to conscious awareness was selective (Baars, 1993) and that attention had a role in allowing some part of the contents to reach consciousness (Baars, 2002;Cowan, 1988;Oberauer, 2002Oberauer, , 2009). Further research revealed that there was not an identical representation between working memory contents and corresponding consciousness, and that there may be a kind of independent and parallel representation between subjective conscious awareness and the memory trace. First, studies showed that working memory performance accuracy was independent of visual working memory contents accessible to conscious awareness, as shown by performance of the vividness of mental imagery and task confidence (Hassin et al., 2009;Soto et al., 2011). Second, changing the memory trace could influence the performance of working memory, but not the corresponding consciousness (Bona et al., 2013;Bona & Silvanto, 2014). Third, the contents of working memory whether accessing to consciousness or not, has distinct influences on visual input processing (Craver-Lemley & Reeves, 1992;Pan et al., 2014;Perky, 1910). Specifically, if visual input matches with the content of working memory, it can get access to consciousness easily and the mismatched part would be restrained or discarded. Empirical evidence demonstrated that the representation of working memory was internally unconscious, while the corresponding conscious awareness may create a new representation accessible to consciousness, whose function was different from the one of working memory.
Mental imagery, an internal conscious percept process (Kosslyn, 1996), can be cast as the conscious experience of working memory content (Jacobs & Silvanto, 2015) and may directly address the separate representations between external stimulus and internal consciousness. Herein, we confirmed that mental imagery shared an underlying neural representation with visual perception in the cortical connectivity, which might be considered as the introspection of working memory content. Strikingly, the prediction accuracies between Gabor specificity and non-Gabor specificity of imagery were almost equal, the same as the ones of perception. This may imply that imagery, internally generated consciousness, not only creates corresponding conscious experience but also internally represents the relevance of an external stimulus. Jacobs and Silvanto (2015) has proposed that the conscious awareness of working memory does not engage in changing the original memory trace, but rather creates a new representation that is co-existing with the original.
Mental imagery, regarded as an ability to generate visual experience without triggering by corresponding sensory stimulations (Kosslyn, 1996), provides an important means to study the complex topic of brain representation. Many neuroscience studies tried to demonstrate similarity and discrepancy between perception and imagery (Cichy et al., 2012;S. H. Lee et al., 2012;Schaefer et al., 2013), and have furnished much evidence for the neural representation of the world. Especially, an increasing number of studies have confirmed that neurons in EVC do not process in a fashion of feature detectors (Kayser et al., 2004;T. S. Lee & Nguyen, 2001;Sugita, 1999). With ground truth measurements, our results demonstrated that neurons in the EVC were active both during imagery and perception, and activation of neurons in the EVC could be effectively predicted by the high level visual neural activity. Moreover, according to our results, which showed the same prediction performance between Gabor specificity and non-Gabor specificity of labelled voxels in the EVC, suggesting that the brain might simultaneously generate an internal copy representation of the stimulation when making responses to perturbations of the stimulation from the real world.
There were several issues which should be considered in future work. First, application of the voxel-wise encoding models gave us an opportunity to explore the shared and independent representation mechanism between visual perception and mental imagery, and thus, in the future development of technique and methods, the questions of dynamic neural representation (Dijkstra et al., 2018) and the visualization of the independent representation need to be addressed. Second, the clear and elaborate representation mechanism offers an alternative possibility concerning applying and updating the artificial intelligence system with consciousness. Third, the shared and independent neural representation mechanism, in a more active insight, may lead us towards a deeper understanding and explanation of abnormal phenomenon of brain information processing, such as found in hallucination and schizophrenia. These results could then in turn lead to more effective and reasonable treatments.
In conclusion, the present study applied the voxel-wise encoding models with representing two sources of information, stimuli relevance and the internal neural connectivity. We firstly showed that during perception there is a linear combination of Gabor features and internal cortical connectivity, which was shared by mental imagery.
We further demonstrated that the two sources of information were represented independently in the hierarchy of the visual system during visual perception and mental imagery. These observations provided new insights into the underlying neural subtract between perception and imagery.