Using electroencephalography (EEG) in a paradigm of rapid presentation of face stimuli, the current study examined the neurophysiological underpinnings of realness perception in gradually stylized human face images. A previous study on this dataset mainly focused on the correlation between the amplitudes of SSVEP responses at one specific channel and behavioral data30. To further extend these findings, our study aimed to comprehensively explore neuronal processes reflecting realness perception. To this end, we analyzed neuronal responses in both the frequency and time domain by SSVEP and evoked responses (N170), respectively. We found that the amplitudes of neural responses, reflected both in SSVEP and N170 potentials, exhibited a quadratic relationship with the degree of realness. Although another previous study using the same stimuli material but a different paradigm has reported a similar quadratic relationship between N170 amplitudes and the level of realness31, there is no conclusive explanation why this phenomenon exists as it may also reflect low level visual features relating to different realness levels.
The N170 is an ERP component that has been repeatedly shown to reflect face processing 32. Furthermore, many studies have found the N170 to be modulated by structural properties of face stimuli, such as emotional expression33, facial movement in general35, and eye gaze directions in particular36,37,38. In this context, it has been suggested that the N170 is generated by brain processes involved in the structural encoding of face stimuli65. Thus, such configurational analyses of the face’s features may be a critical driver for N170 amplitude effects. Another classical experiment in face perception showed that N170 amplitudes also increased when participants are presented with inverted faces34. In other words, the brain may need more "effort" to deal with the inverted situations, resulting in higher N170 amplitudes. Combining those two major points in the context of our study, we propose that, following the idea of N170 amplitudes being increased when participants were presented with inverted faces, the human brain may need more "effort" to recognize the images as genuinely human faces when being presented with the cartoon-like images, leading to highest amplitudes in the cases of most stylized images (R0 in this study). At the same time, following the structural encoding hypthothesis6, more neural activity is evoked by an increasing number of facial details as the level of stimulus realness increases, such as by richer information on emotional face expressions or identity cues, which should in turn result in higher N170 amplitudes in the cases of real photos (R5). For the images with middle ranges of realness, the human brain may need to compromise between those two factors, which leads to the quadratic modulation effect. Besides, the non-linear relation between the realness and brain responses could be partially explained by the fluctuation of emotional arousal in the UV effects. That is, following the definition of the UV effect, images with different realness levels trigger inconsistent emotional feelings, meanwhile, the emotional arousal affects the general level of related responses (N170, e.g.). Actually, the emotional component of the UV effect was indeed observed in the behavioral results of Bagdasarian et al., (2020)30, where most participants reported that images of class R4 were more likely to evoke negative feelings (reflected as appeal; reassurance; attractiveness), compared with class R3 and class R5.
Although the SSVEP is conceptualized as a response to periodic stimuli, and the ERP as a response to individual stimuli, our study found compatible quadratic relationships between the amplitudes and the realness level, including both SSVEP and N170. Researchers often would not analyze the ERP responses in an SSVEP paradigm because of the high stimulus-presentation frequency that leads to the overlap of the current and preceding evoked response. However, benefiting from the 200-ms inter-stimulus interval in the current study, we here have the opportunity to fill the gap between previous studies that either found an effect of face realness on N170 amplitudes or SSVEP amplitudes. Our data suggest that these effects may originate from the same neuronal mechanism, that is hypothetically, the structural encoding of facial features. Moreover, SSVEP could be modeled as the temporal superposition of transient ERP66, which probably explains why we found similar quadratic modulation effects in SSVEP and N170 amplitudes. However, SSVEP responses are not only the superposition of N170 components but also of other ERP components, such as the P100. Presumably, the complexity of superimposed ERP responses in the SSVEP measure thus leads to the differences between the SSVEP and the isolated N170 component, as reflected in the more pronounced quadratic relationship for the N170 (Fig. 4) as compared to SSVEP (Fig. 3). Moreover, to further utilize the spatial information of the EEG and extend our analyses from the sensor-level, we applied SSD to extract the most pronounced and consistent SSVEP responses across subjects. Although also the SSD results demonstrated realness effects of a non-linear nature, we did not find a clearer quadratic relationship as compared to N170 results. This further supports the idea that SSVEP includes a mixture of N170 and other visual evoked response components that might not all exhibit the same effect across realness levels. Nevertheless, given that we observed in-principle corresponding realness effects in both SSVEP and N170, an advantage of SSVEP is its rapid stimulation frequency, thus offering a less time-consuming but still informative way of probing neural correlates of a face stimulus´ realness.
Furthermore, compared with the components of the fundamental frequency (5Hz), the harmonics might contain other additional information at higher frequency. We did not find a similar quadratic relationship from the amplitudes of harmonic components (10Hz and 15Hz). The even harmonics (i.e., 10Hz, 20Hz, etc.) might have been contaminated by the 10Hz refreshing frequency between the stimuli and the backgrounds. In general, higher harmonics do not indicate the presence of evoked responses at higher frequencies, they rather reflect a non-sinusoidal nature of neuronal signals67,Error! Reference source not found.. Importantly, higher harmonics are the first to be affected by the low SNR of the neuronal responsesError! Reference source not found. and thus are expected to demonstrate less significant or even absent statistical effects compared to the base frequency (i.e., in our case 5 Hz). Furthermore, according to the scalp topographies in Fig. 5, the 5Hz component was extended towards lateral parieto-occipital regions as compared to the higher harmonics of the SSVEP. This suggests that the 5Hz component spanned neural processes from early visual perception in medial occipital areas up to specialized face-related neural activity.
Besides, many low-level visual features, especially the eyes, may influence the amplitudes of both SSVEP and the N170. We found that eye size had a negative correlation with the degree of realness in our stimulus set. However, we showed that quadratic models describe EEG data better than linear models even after linearly regressing the eye size. Thus, the observed nonlinear relationships between EEG response amplitudes and realness levels were not driven by low-level stimulus features such as eye size. However, our findings strongly emphasize that such factors need to be carefully controlled in future studies, either already during stimulus preparation or by including eye size as a covariate in statistical models. Additionally, for the typical pair R4 and R5, in which we did not find any difference in eye size and luminosity, the classification algorithm successfully distinguished those two categories. We also found a significant difference in the N170 and SSVEP amplitudes, between the group of R4 and R5 (N170: p < 0.001, SSVEP: p < 0.001). Those results suggest that a comparison between highly realistic CG images and photos of real people is indeed possible with EEG to further explore how the human brain perceives face realness.
In our study, we implemented two kinds of spatial filtering methods: SSD and TRCA. SSD was chosen to focus on signals in the narrow frequency band around 5 Hz. In contrast, TRCA was chosen to focus on broad-band signals, phase-locked to the rapid stimulus presentation. TRCA was achieved by maximizing the cross-session covariances, leading to optimized spatial filters for "task-related" (i.e., stimulus-locked) activity. Another crucial factor affecting the overall classification performance is the length of the time window. In our study, we selected 2 s and 8 s to compare the classification accuracy in time windows of different lengths. For the pair of stimulus categories R4 and R5, the accuracy of 2 s data was smaller than 8 s data, while for the pair of R0 and R5, the accuracy did not show a significant difference between 2 s and 8 s. In other words, the classification pair of R0 and R5 may need even fewer data to be classified. However, the classification between groups of R0 and R5 could be affected by other confounds, such as the large eye in the stimulus category R0, given that is known that the neural processing of face images is affected by this parameter36,37,38. Thus, contrasting the R4 and R5 categories may be most informative in the context of realness-related neural activity due to their comparability of low-level visual features (e.g., eye size)., Overall, we suggest that SSVEP-based classification may represent a paradigm that allows for saving experimental time (since it requires shorter data segments) compared to traditional ERP, in order to decode perceived realness levels from neural data.
It should be noted that the key idea of the classification algorithm was fully based on the Pearson correlation between filtered templates and filtered testing data. Thus, the classification results are based on a number of diverse spatial and temporal features of neuronal responses. This includes the effects of different amplitudes, the differences in scalp topography, and potentially also the variability of latencies of the SSVEP responses across stimulus conditions. These rich neuronal parameters allowed us to classify with EEG the realness level of face images. This may represent a promising starting point for future studies, further pinning down the neural substrates of realness perception, as it is still an open question how aforementioned complex EEG features interrelate with each other (e.g., spatial aspects, amplitude, phase, and other parameters). Another important factor in the classification approach is the way the channels are selected. In this study, because SSVEP components are usually located in visual cortical areas42 and to avoid overfitting, we chose nine channels in the parieto-occipital region as the first step of channel selection. However, a broader a-priori channel selection may be conceivable in future studies, too, for example to also be able to assess higher cognitive processes that happen further downstream of the neuronal response cascade. In general, this classification algorithm works well in the single-trial detection process. Thus, a potential application of this algorithm could be a real-time system that can quantify the realness level of face images by decoding the EEG data. A novel detection system that can detect realness levels according to immediate neural responses automatically might be helpful for the CG designer to better cross the "valley" in the UV phenomenon.
In conclusion, our study investigated how face images with different levels of stylization modulated the amplitudes of neural responses, including SSVEPs and the N170 component. We found a quadratic relationship between response amplitudes and the degree of realness, which may well correspond to the UV. Of note, face perception is a complex process, which certainly also entails additional neural activities. Taking the UV effect as an example, as suggested in a recent review25, ERP correlates of the UV effect may vary from early negative potentials (N170) to late positive potentials. Furthermore, the current study examined realness perception in a very wide range of realness levels (simple cartoon images to real photographs). To pinpoint the neural correlates of realness perception even further, it would be desirable to "zoom" into realness levels around the uncanny valley in future studies. For instance, would SSVEPs and N170 amplitudes show a similar relationship with the stimulus’ realness levels also with more subtle differences here and would this correspond to subjective realness perception? Moreover, it may be another promising research avenue to utilize such realness-perception correlates in the EEG to inform algorithms for realistic face image generation in a biologically meaningful way.