Different Top-Down Goals Modulate Early Stage of Attention Bias: Irrelevant Task Suppresses the N170 of Automatic Attention Allocation to Threat Faces

Haoran Dou Institute for Brain and Psychological Sciences, Sichuan Normal University, Chengdu 610101 Limei Liang Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian 116029 Jie Ma School of Psychology, South China Normal University 510631 Jiachen Lu School of Psychology, South China Normal University 510631 Wenhai Zhang (  zwh20120106@163.com ) College of Education Science, Hengyang Normal University, Hengyang 421002 Yang Li Chengdu medical college,School of psychology, 610599, Chengdu 610101


Introduction
The facial expression of fear carries wealthy social information indicating a potential threat that requires immediate allocation of attention 1 . Much of previous behavioral research has suggested that individuals have attentional bias for fear faces relative to neutral faces 2,3 . Several neuroimaging studies using functional magnetic resonance imaging have shown enhanced activation to fear stimuli relative to neutral stimuli in the amygdala, which modulates the bottom-up processes 5,6,7 . For example, Bishop et al. (2004) 6 found increased amygdala response to fear versus neutral faces regardless of attention focus in highly anxious participants. Top-down processing-where prefrontal cortex plays a key role-may also modulate the processing of emotional visual attention [8][9][10][11] . The debate has not yet reached a consensus on whether top-down goals could modulate the attention bias from bottom-up processing of fearful faces and the early neural processing requires more experimental evidence.
A growing amount of current psychological evidence supports that prior attention to threat stimuli is modulated by top-down signals 3,12−14 . Speci cally, the contingent capture hypothesis suggests that attention capture is contingent on the top-down attention set rather than a function of bottom-up salience 15 . For example, using a visual search paradigm, Hahn and Gronlund 12 observed that people are faster recognize angry faces than happy faces from neutral background, but this effect disappears when they were irrelevant to the task goal. However, there is no consensus about the top-down modulation of attention bias for threatening faces. Regarding the debate about top-down and bottom-up, Bacon and Egeth 16 suggested that there are two distinct selection modes-the singleton detection mode and the feature search mode. The singleton detection mode is the pattern people automatically detect odd or unique stimuli in a dimension (such as odd color) known to be irrelevant to the task (bottom-up), while the feature search mode is the pattern people search the target with a particular feature (such as red color) that is in their mind beforehand (top-down). They found that goal-directed selection in feature search mode may override the stimulus-driven capture of salient singletons. We aim to investigate how different top-down affect the attention to threat faces under a speci c search mode.
Several paradigms have been employed to explore the attention bias for threatening faces, such as the visual search task 12,17,18 , the emotional stroop task 19,20 , the dot-probe paradigm 21,22 , and the spatial cueing paradigm 14,15,23,24 . In the visual search task and the emotional stroop task, the stimuli are presented simultaneously so that the stimulus-driven and goal-driven attention cannot be easily separated 25 . In the dot-probe paradigm, threat and neutral stimuli usually appear at the same time.
Although participants understand their current goal-driven task, stimuli-driven distractors are always present; thus, the strength of top-down processing is different to control, and the results using this paradigm have a certain one-sidedness 26 . In contrast, in the spatial cueing paradigm 15 , there is a cue after a xation, and participants then respond to the target after an interval. The target location is either consistent with the cueing location (valid cueing) or inconsistent with it (invalid cueing) 27 . Generally, in valid cueing, the target is found faster than in invalid cueing 28 , and the strength of inconsistent level in valid cueing is weaker than in invalid cueing 29 . Moreover, other studies manipulated the relevance between the current task and threat stimuli to control top-down processing. In previous studies, when the task is relevant to the emotion, threat stimuli are attention attracting. While the task was irrelevant to the emotion, the attention bias to threat stimuli was inhibited 3,14 . Although Victeur, Huguet & Silvert 30 adopted different irrelevant tasks in the spatial cueing paradigm, the early neural processing of different top-down goals affect the attention bias of threatening stimuli remains unclear.
Given their high temporal resolution, event-related potentials (ERPs) have been used to investigate the early automatic processing of fearful face. Specially, the N170 component is an early face-speci c component, which ranges from 120 to 220 ms and peaks at about 170 ms post-stimulus 31,32 . Some researchers have found that fear faces induce larger N170 amplitude relative to neutral faces 33; 34; 35 .
Moreover, previous studies have claimed that N170 represents early perceptual processing of faces and can be modulated by top-down in uences 31,36,37,38 . In contrast, the vertex positive potential (VPP) is a positive-going component at fronto-central recording sites with a peak and latency similar to N170 39 . Previous research has indicated that like N170, facial expression modulates VPP amplitude, which is larger in response to fear faces relative to happy and neutral faces 40 . However, other studies indicated that VPP may become delayed or sometimes attenuated in some conditions such as faces disrupted by an inversion or scramble 41,42 , which is different from N170. Joyce and Rossion 43 suggested that the different results of N170 and VPP reported in previous studies were attributable to the location of the reference electrode. Some source localization studies concluded that both components re ected the same brain process in face perception, which is located in or near the fusiform gyrus 35,[43][44][45] .
In the current study, we aimed to use N170 and VPP as electrophysiological indicators to examine how different top-down goals in uence attentional bias for threatening faces. To address this question, we used a modi ed spatial cueing task with four inconsistent levels according to cue validity and task relevance (see Table 1). Speci cally, in the weak level, the cue was valid and the task relevant(emotion recognition task); the medium level involved invalid cue and relevant task(emotion recognition task); the strong level involved invalid cue and irrelevant task(gender recognition task); and the very strong level, invalid cue, irrelevant task(gender recognition task), and inconsistent gender of cue position and target position. Based on previous results 3,14,46,47 , we predict that the attention bias for cue fear faces will decrease with the increase in the inconsistent levels. There will be no signi cant difference between fear and neutral faces on reaction time in the strong level. However, the difference will be signi cant when the inconsistent level is weak. When the inconsistent level is strong, we do not expect to nd signi cant differences between fear and neutral faces on the amplitudes of N170 and VPP because of the di culty in disengagements; However, signi cant differences between the fearful face and neutral face are expected when the inconsistent level is weak.

Reaction time
The outliers of reaction time outside the range of ± 3 SDs from the mean were excluded from our analysis. The two-way repeated-measures ANOVA showed a signi cant main effect of the inconsistent level, F(3, 81) = 8.00, p < 0.01, η p 2 = 0.54. Post-hoc analyses showed that participants responded faster for the weak level than for the other three levels, ps < 0.01; they responded faster for the medium level than for the very strong level, p < 0.01. No main effect of cue emotion was observed, F(1, 27) = 0.01, p = 0.91, η p 2 = 0.03. In addition, the analysis showed a signi cant interaction between inconsistent level and cue emotion, F(3, 81) = 4.91, p < 0.01, η p 2 = 0.36 (see Fig. 2). Simple effect analyses revealed that the mean reaction time to cue fear faces (703.31 ± 78.96 ms) was shorter than that to cue neutral faces (750.67 ± 63.82 ms) in the weak level (p < 0.01); in the medium level, the mean reaction time to cue fear faces (837.72 ± 74.92 ms) was longer than that to cue neutral faces (797.99 ± 57.29 ms) (p < 0.05). However, in the very strong and strong levels, no signi cant differences between cue fear and neutral faces were found on the mean reaction time.

Accuracy
We used two-way repeated-measures ANOVA test and found that the main effect of inconsistent level was signi cant(F(3, 81) = 17.049, p < 0.001, η p 2 = 0.387). And the pairwise comparison showed that the accuracy of weak condition was larger than other conditions(medium level(p < 0.01), strong condition(p < 0.001), very strong level (p < 0.001)). There was no difference of accuracy between other conditions(medium vs strong(p = 0.300), medium vs very strong(p = 0.103), strong vs very strong(p = 1)). The

Discussion
We applied a modi ed spatial cueing paradigm to investigate attention bias to fear faces under different inconsistent levels with the irrelevant task and invalid cue. In line with previous studies 2, 35 , we found fear faces attracted more attention than neutral faces in the weak inconsistent level. Moreover, in the medium inconsistent level(invalid cue), fear faces had a slower RTs compared with neutral face. Besides, we also found that the attention bias for fear faces disappeared in the irrelevant tasks(strong and very strong inconsistent levels). Our ERP results showed that target fear faces in cue position induced larger N170 amplitudes than did neutral faces in the relevant task(weak and medium inconsistent level) and VPP results indicated that the different between fear face and neutral face only showed in weak inconsistent level. This result showed that the di culty of disengagement from the irrelevant task suppressed the automatic attention allocation of fear face. Taken together, these results suggest that top-down processing could regulate early attention bias for fear faces.
In the current research, one essential ndings was that top-down processing moderates the automatic attention process of fear faces. In previous studies, salient stimuli-such as threatening faces, color, motivation, and special shape-were prioritized in visual processing 48 , but ignored when the task goal demanded paying attention elsewhere directly. Folk and Remington 49 proposed that top-down processing spatial attention capture does not rule out any role of stimulus salience, which is determined by the de ning feature of the target. This view, known as the contingent capture hypothesis, has been supported by many studies since it was proposed 13,50,51 .According to the contingent capture hypothesis, it may be explained that in the weak and medium conditions, cue fear faces are de ned by special features relevant to the target; so as distractor cue faces are able to attract spatial attention. In contrast, cue fear faces share few features with target in the strong and the very strong conditions, and hence they can be easily overlooked by top-down goal set. Moreover, we found the different emotion effect of RT in the weak and medium levels. In the weak level(valid cue in emotion recognition task), we found the fearful face RT was smaller than the neutral face RT. This result showed that the fearful face facilitated the orienting of spatial attention. It was reasonable because the threatening stimuli were prioritized at the early processing of attentional allocation [52][53][54] . However, in the medium condition(invalid cue in emotion recognition task), the RT of fearful face was larger than the RT of neutral face. This result was related to the di culty in disengagements of fear faces in the cue position. This different result were also consistent with the previous ndings. Carlson and Reinke 55 used the fearful eye region stimuli with the dot-probe paradigm to test whether the emotion of eye region modulated the attention bias, and found that fearful eyes facilitated the orienting of spatial attention and delayed disengagement within three experiments.
For N170 amplitudes, previous research shows that N170 component may re ect the structural encoding of faces 56 , and it can be modulated by emotional facial expression 33,57 . In our study, we found signi cant differences between cue fear faces and cue neutral faces in the weak and the medium conditions, but no differences in the strong and the very strong conditions. This result was consistent with our hypothesis. The gender recognition task(irrelevant task) have more and stronger top-down goaldirected processing compared with the emotion recognition task. Therefore, the effect of di culty in disengagement from the fear face was suppressed in the irrelevant task, which was in accordance with the previous behavioral results 14,30 . More speci cally, Vromen et al. 14 used the threatening stimuli(spider gure) and spatial cueing paradigm with two different tasks; they also found that the delayed disengagement from a non-target spider was showed only when the spider was the target set, rather than when it was task-irrelevant. Besides, our results might also be able to explain by perceptual load theory.
According to the perceptual load theory 58 , during the processing of selective attention, the allocation of the attention was decided by perceptual load; the distraction can only be processed in the low perceptual load. In our design, the irrelevant task(gender recognition) has a little larger perceptual load than relevant task(emotion recognition). We found the N170 showed no difference between fear and neutral faces in the larger perceptual load conditions(strong and very strong levels in the irrelevant task), but in the lower perceptual load conditions(weak and medium levels in the relevant task), the fearful faces evoked larger N170 than the neutral face did. However, it is hard to tell the difference between the pure perceptual load and the different top down perceptual set in this study. We suggested the future studies make a new design to distinguish the difference between the perceptual load and the top-down goal directed perceptual set.
We found VPP, like N170, were affected by different top-down goals. Speci cally, cue fear faces elicited larger VPP amplitudes than did cue neutral faces in the weak condition, and there were no signi cant differences in the medium, strong, or very strong condition. Moreover, the similar latencies of N170 and VPP suggested that they may be derived from the same neural dipole, which is consistent with previous studies 35,43,45 . Nevertheless, N170 and VPP amplitudes showed differences in the medium condition, with one possible explanation being that the reference electrode may affect the observed signals and functional differences between the N170 and VPP components 43 . For example, Itier and Taylor 42 found a larger N170 amplitude response to inverted faces, but the effect was not signi cant at the level of VPP. Despite this, the authors considered both peaks as forming part of the same component. In our study, the average reference was used, which yields a large peak at N170 sites and a small peak at VPP sites. Future studies should use more advantaged reference sites, like the reference electrode standardization technique, to explore the neural mechanism between the two components.
In addition, there are no differences on response time, N170 amplitude, and VPP amplitude between cue fear faces and cue neutral faces in the strong and the very strong conditions. These results suggest the attention bias for threatening stimulus was suppressed by the irrelevant task. That means the effect of fear facial distraction in the cue position was also inhibited through suppressing the delayed disengagement from the fear face. However, we did not found any difference of Accuracy between fear face and neutral face in four different inconsistent levels. This results indicated that the accuracy in this design were not sensitive to re ect the emotional effect compared with other activities, such as RT or N170.
There were still some limitations in this experiment. Firstly, we did not counterbalance the facial emotion in the target position between the gender recognition and emotion recognition tasks. Because, we wanted to maintain the disengagement effect from the fearful distracted faces in the irrelevant task(gender recognition task). This might bring some extra variable when compared the two tasks, which can be resolved in the following researches. Secondly, the ERPs used in this study were not sensitive in the spatial resolution. Therefore, we did not know the activities in the related brain regions, such as fusiform face area(FFA), amygdala. Future studies are able to adopt the technique with more sensitive in the spatial resolution, such as magnetoencephalogram(MEG).

Conclusion
Our ndings provide evidence that fear face facilitated the orienting of spatial attention and di culty in disengagement from fear face in relevant task, but the disengagement from fear face was inhibited in the irrelevant task. Moreover, we provide more early neural processing evidence on different top-down goals modulate early stimulus-driven attention bias in the time window of N170 and VPP.

Methods And Materials
Participants Twenty-eight healthy college students (14 females; mean age = 22.83 years, SD = 2.95 years) from Liaoning Normal University took part in the experiment and were paid 30 RMB for their participation. They were all right-handed and had normal or corrected-to-normal vision without psychiatric or neurological history. All participants provided written informed consent prior to the experiments. All methods were carried out in accordance with the Declaration of Helsinki 59 and the research protocol was approved by the Ethics Committee of Liaoning Normal University.

Stimuli and apparatus
All pictures were selected from the Chinese Facial Affective Picture System (CFAPS) 60 . The pictures were 20 neutral faces and 20 fear faces, with an equal number of male and female faces. All pictures were frontal headshots. The fear and the neutral faces differed signi cantly in valence (mean ± SD: fear = 2.78 ± 0.98, neutral = 4.35 ± 0.12; t(38) = 2.98, p < 0.01), but were similar in arousal (mean ± SD: fear = 5.32 ± 0.54, neutral = 5.31 ± 0.27; t(38) = 0.72, p > 0.05). Each face was displayed within a placeholder box-a black outline with the size of 114 × 88 pixels (i.e., 3 × 2.6 cm 2 ). Both the spatial cue and the target box were green colored; Therefore, the cue should produce a reliable cueing effect on target identi cation, even though it was uninformative as to the actual target location 14 . The stimuli were presented on a 19inch monitor with a resolution of 1024 × 768 pixels at 100 Hz refresh rate. The viewing distance was around 80 cm.

Task
We used a modi ed spatial cueing paradigm and a 4 (inconsistent level: very strong, strong, medium, and weak) × 2 (the emotion types of the cue position in target: fear and neutral) within-subjects design. Participants were comfortably seated in a quiet laboratory and received instructions to complete the modi ed spatial cueing task (see Fig. 1). At the beginning of each trial, a central xation cross and four placeholder boxes were presented for 800 ms in a cross-like arrangement. Each box was positioned with its nearest corner 6 cm away from the central xation cross. Next, the cue (green frame), one of the four boxes with equal probability, appeared for 150 ms. After another xation displayed for 150 ms, the target was presented for 300 ms. The target consisted of the xation at the center and four face pictures around. On each trial, there were three non-target faces and one target face with the green frame. Once the target appeared, participants had to respond as quickly and accurately as possible by pressing the "F" or "J" key on a keyboard(emotion recognition task: F, fearful face, J, neutral face; gender recognition task: F, male, J, female). The keyboard press were counterbalanced between the participants. Each trial ended with the presentation of a blank screen for 2000 ms.
In our research, we de ned the weak inconsistent level as no con ict arising from inconsistent spatial locations of cue-target or from emotion-irrelevant task. Particularly, the green frame target face was displayed at the same position as the cue. The fear and neutral faces were equal in terms of cue (target) position. Besides, the participants were asked to distinguish the emotion (fear or neutral) of the target face. This emotional recognition task was related to the face type (emotion: fear or neutral).
In the medium inconsistent level, the con ict arose only from the inconsistent positions of cue and target, that is, the green frame target face was not displayed at the cued position. The participants completed an emotion recognition task like that in the weak condition.
In the strong inconsistent level, the cue and target positions were inconsistent. Moreover, participants were asked to identify the gender (male or female) of the target face rather than recognize its emotion. The target faces were always neutral in order to design the irrelevant emotion task. Moreover, the cue and target faces were of the same gender. Therefore, compared with the medium inconsistent level, the con icts in the strong inconsistent level arose from inconsistent spatial cue-target locations as well as from the emotion-irrelevant task.
The very strong inconsistent level was basically the same as the strong condition, but the cue and target faces were of the different genders. In account of this inconsistency, participants required more resources to inhibit the automatic processing of the fearful face in the cue position. In all conditions, there were always two female and two male faces, and the distribution of gender within the targets was 50% male and 50% female.
Each participant completed 36 practice trials and 384 test trials. The trials in each of the four different conditions made up 25% (96 trials) of the total number of trials. The four conditions were divided into two blocks according to their task (emotion recognition and gender recognition). The experimental instructions were presented to participants before each block. In the emotion recognition block, trials of the weak and medium conditions were randomly presented. In the subsequent gender recognition block, the trials of the strong and very strong conditions were also randomized. There was a period of rest of more than 60 s after each 96 trials. Electrophysiological recording and analysis Electroencephalographic (EEG) data were recorded using 64 Ag/AgCl electrodes placed in a Quick-cap (conforming to the International 10-20 System). These data were referenced online to the CPz electrodes. Horizontal electrooculography (EOG) data were recorded from two electrode sites at the outer canthi of each eye. Vertical EOG data were recorded from electrodes situated on infra-orbital and supraorbital regions of the left eye. The impedance of the electrodes was kept below 5 kΩ. The sampling rate was 500 Hz. The signal was band-pass ltered at 0-104 Hz and stored for o ine analysis. The EEG processing and ERP analysis were performed using Brain Vision Analyzer 2 software (Brain Products, Germany). If the electrodes could not collect standard data (e.g., loose or faulty electrodes), we adopted the interpolation method by using the electrodes nearby. In order to remove the artifact of movement, we conducted raw data inspection manually for obvious drifting and other artifacts before the analysis (less than 5% data). A semi-automatic independent-component-analysis-based algorithm was used to perform blink correction 61 . The EEG data were re-referenced off-line to the average reference and band-pass ltered to 0.1-30 Hz (24 dB/octave). The ERP waveforms were time-locked to the onset of the target stimulus, and their time window was from 100 ms pre-stimulus to 1000 ms post-stimulus with a 100 ms pre-stimulus baseline. Epochs with amplitudes over ± 75 µV were automatically rejected from averaging.
In the present study, the peak amplitudes of the N170 and VPP components were analyzed. Based on previous ndings 62, 63 and the topographical distribution of the grand-average ERP activity, we selected the following electrodes of interest: P7, P8, PO7, and PO8 for N170 in the time window of 150-200 ms; and Fz, FCz, Cz, C1, and C2 for VPP in the time window of 150-200 ms. According to Luck & Gaspelin 64 , we further used the average value of each electrode group to reduce the number of analyses and potential type I error in the analysis of variance outcomes. Therefore, statistical analysis did not include the electrode factors.
Two-way repeated-measures analyses of variance (ANOVAs) were used to analyze the ERPs and reaction time as dependent variables and inconsistent levels (very strong, strong, medium, and weak) and cue emotion (fear and neutral) at the cue position in target as within-subjects factors. Degrees of freedom were corrected according to the Greenhouse-Geisser method. In addition, the Bonferroni correction was used for multiple comparisons and post-hoc analysis. All analyses were conducted using the SPSS 19.0 software.