Unconsciously Guided Behavior With a Bias-Free Measurement of Consciousness

Many researchers believe that unconscious, invisible stimuli can guide behaviour, but convincing evidence for this phenomenon is lacking. The controversy results from the diculty of dening and measuring consciousness in an unbiased way. We utilized a bias-free 2-interval forced-choice (2IFC) paradigm to study whether orientation or color of unconscious, masked stimuli can guide behaviour. Each trial consisted of two intervals, only one of which contained the target stimulus. The observers were forced to discriminate the orientation or color of the stimulus in each interval, and in the end of the trial they made a 2IFC decision on which one of the intervals the stimulus was present. We focused on the trials in which the 2IFC decision was made incorrectly, suggesting unconsciousness of the presence of the target. In masked trials both orientation and color was discriminated with higher accuracy than expected by chance. Control trials showed that the participants followed the instructions and performed at ceiling in the 2IFC task, indicating that the specic non-perceptual cognitive requirements of the 2IFC task cannot explain the incorrect 2IFC decisions. The present results provide strong evidence for unconsciously guided behaviour. Further studies should examine the constraints of this phenomenon with the unbiased procedure.


Introduction
Many laymen and psychologists believe that unconscious visual stimuli can guide behavior [1]. From the perspective of scienti c studies, however, this belief on unconscious performance still lacks undisputable evidence. The problem reverts to the di culty of measuring consciousness. Consciousness refers to subjective experience [2,3], that is, in the context of visual perception, to the subjective experience of seeing. One way of de ning unconsciousness [4] is to state that "for a stimulus to be unconscious, an observer's subjective experience of the stimulus should be no different from the subjective experience of nothing at all." In the present study on unconsciously guided behavior, we follow this line of de ning conscious and unconscious processing, and consider a stimulus as completely unconscious only if the observer does not have any experience of its presence. Because consciousness is subjective experiencing, consciousness or unconsciousness of visual stimuli is usually measured by asking the participants to judge their subjective experience [5]. In a typical procedure examining unconscious performance, the observer is presented a visual target stimulus whose visibility is reduced with visual masking. The observers are forced to make a decision concerning a feature (e.g., orientation) of the target, even if they subjectively do not report seeing it. When no subjective perception is reported and the decision is made better than expected by chance, unconscious performance is concluded to have occurred. However, evidence for unconscious performance on basis of visual stimulus has been controversial [1,5,6,7,8,9,10,11,12]. One should note, however, that it is uncontroversial that a considerable amount of preconscious processing occurs before the stimulus reaches consciousness. The controversy here does not concern the existence of unconscious visual processing per se, but whether unconscious (invisible) stimuli can in uence the overt responses of the observer.
The concept of criterion content refers to the attributes of a stimulus on which an observer bases the perceptual decision [13]. Part of the con icting results on unconscious performance can be explained by the criterion content in subjective judgements of consciousness. It seems relatively easy to obtain "unconscious" effects, if the criterion for consciousness judgements corresponds to those stimulus features that are measured in the objective discrimination task [8]. This occurs, for example, when the objective task requires discrimination of the orientation and the observers subjectively judge that they did not "see the orientation." On the other hand, when the participants judge that they saw nothing at all (i.e., no stimulus features), the unconsciously driven effect often disappears [5,9], but not always [8,12]. This later type of unconsciousness, experiencing no difference between the presence and absence of stimulus, is what we mean when referring to unconscious performance.
The key problem in studies on unconsciously guided performance is the validity of the subjective reports as a measure of consciousness. It is well known that the response criterion is conservative, that is, the participants are biased to report no awareness rather than awareness [14]. This makes it possible that the participants in fact have a graded conscious perception of the stimulus, a percept which is not strong enough to pass the response criterion for reporting it, but is su cient for better than chance level performance. The criterion for reporting awareness of the presence of a visual stimulus may uctuate even within single trials, as shown by experiments [8] in which the observers were required to rate their awareness of two different features of the stimulus within the same trial.
An important methodological development in measuring consciousness of visual stimuli was the twointerval forced-choice (2IFC) procedure, developed by Peters and Lau [1]. In their 2IFC, the target was presented in one of two intervals and the participants discriminated its orientation. The measure of awareness consisted of betting on which of the two possible intervals the discrimination of stimulus' feature was more con dent (Exp 1 and 2) or on which one it was more visible (control experiment). This procedure can be considered as a bias free measure of consciousness, because the observer has no reason to behave conservatively and to hide any possible, even very weak, subjective experience of the stimulus. Peters and Lau [1] did not nd any evidence for unconscious perception of orientation: as soon as the participants were able to discriminate the orientation of the target better than expected by chance, they were also able to bet the correct interval better than chance. However, they used targets with 13 possible contrast levels between 15-90%. An uncontrolled factor in the group level analysis was the possibility that all or most of the change level trials might have been based on targets which had the lowest contrast levels. It is known that the contrast level in uences processing of masked stimuli [12,15], leaving open the interpretation that the lowest contrast stimuli were too weak to elicit any kind of cognitive processing. Similarly, a potential source of problem may also have been the masking procedure in which the targets were presented between three forward and three backward masks, which may have produced so strong masking that no unconsciously guided performance could be detected. The forward masks interfere with feedforward processing [16,17], which is assumed to underly unconscious behavioral effects [18], and thus it is not clear whether or not the results will generalize to masking procedures in which no forward masks or fewer backward masks are applied.
In the present preregistered study, we avoided too strong masking by using liminal stimuli that are subjectively invisible on some proportion of the trials and whose presence is perceived on other trials. We tested whether evidence for unconsciously guided performance can be found with a variant of the bias free procedure (2IFC) for measuring consciousness, in which only a backward mask was used, and by using targets and masks [8] which have been shown to elicit higher than chance-level unconsciously guided performance, when unconsciousness was de ned with subjective reports (rating "nothing seen" in four-point Perceptual Awareness Scale [5]). In the current study, each trial included two intervals, only one of which contained the target. The observers were forced to perform a forced-choice discrimination of the target's feature after each interval, and in the end of the trial they had to decide in which one of the intervals the target was presented. Thus, here the 2IFC was used as an objective forced-choice task for measuring consciousness. We performed a planned trial-by-trial analysis focusing on the trials in which the 2IFC decision was incorrect. We analysed the proportion of correct feature discriminations in the intervals in which the stimulus was presented, but the participants incorrectly decided that it was present in the other interval (i.e., reporting this way that the stimulus was not present in the analysed interval). As the participants should not have any reason to hide their faintest consciousness, in the incorrect trials they must have been unconscious of the presence of the target. Thus, support for unconscious performance will be found, if the orientation or color discrimination will succeed better than expected by chance in the incorrect 2IFC trials.

Participants
Twenty-six healthy students (mean age = 23.0 years, SD = 3.9, range: 19-31; 3 male) from the University of Turku with normal or corrected to normal vision participated. The participants were students from the introductory psychology courses in University of Turku and they received course credits. One of them performed less than the preregistered criterion of 75% correct in the control 2-interval forced-choice responses (i.e., with 129 ms stimulus-mask asynchrony) and was replaced by a new participant. The sample size was estimated on basis of the study [8] which used the same target stimuli, masks and apparatus as the present study, but subjective ratings as a measure of awareness. The experiment was conducted in accordance with the Declaration of Helsinki and with the understanding and conscious written consent of each participant. The study was accepted by Ethics Committee for Human Sciences at the University of Turku. The study was preregistered at Open Science Forum (osf.oi): the preregistration is available at https://osf.io/9rk5p

Apparatus and stimuli
The stimuli were presented on 19 -inch CRT screen with 1024x768 pixel resolution and 85 Hz screen refresh rate. E-Prime 2.0 software (Psychology Software Tools, Inc.) controlled stimulus presentation and recorded responses. The stimuli were Gabor gratings from Koivisto and Neuvonen [8]. The Gabor stimuli were green (RGB 128, 195, 128) or blue (RGB 128, 128, 255) sinusoidal gratings (1.4 cycles/degree), which were tilted 45° left or right, and subtended 5.0° of visual angle from 40 cm viewing distance. The luminance, measured from the screen, was 41 cd/m² for green and 28 cd/m² for blue. The RGB values were selected on basis of a pilot study (n = 5 participants) in which the RGB values varied. It showed that with the selected values, the green and blue color were equally di cult to discriminate in spite of the different luminance levels. The masks were from Koivisto and Neuvonen's [8] Experiments 2 and 3, originally created by rst blurring a colorful grid, taken from a previous study's Mondrian masks [19], after which the blurred image was superimposed with transparent grey left and right tilted (45°) Gabors with the same size as the stimuli and spatial frequency of 2.8 cycles/degree. The resulting image was rotated 90, 180, and 270 degrees so that four different masks resulted.

Procedure
The procedure followed the 2-interval forced-choice (2IFC) procedure. In the masked trials, the rst interval began with the presentation of the xation cross in the center of the gray background (20 cd/m²) for 500 ms. It was followed by blank gray screen for 500 ms, after which a randomly chosen stimulus or a blank screens was presented for 12 ms (i.e., one screen refresh). The stimulus or blank screen was followed by a randomly chosen mask after 12 ms (1 screen refresh) for 24 ms (2 screen refreshes), so that the stimulus-onset asynchrony (SOA) was 24 ms. The participant made the rst discrimination response (left vs. right tilt in the orientation task or blue vs. green color in the color task). The response was indicated by left and right arrow buttons in standard computer keyboard. The response was followed by the second interval which began with a blank gray screen for 500 ms, a randomly chosen stimulus or blank screen for 12 ms (i.e., one screen refresh), and after 12 ms (1 screen refresh) blank screen a randomly chosen mask for 24 ms (2 screen refreshes). The participant made the second discrimination response (left vs. right tilt in the orientation task, or blue vs. green in the color task). The response was indicated by left or right arrow buttons in standard computer keyboard. In 50% of the trials, the stimulus was presented in the rst interval and in 50% in the second interval, in random order. One sec after the second response, words "First" and "Second" appeared on the screen written one below another, prompting the participant to perform the 2-interval forced choice-response (2IFC response): the participant had to choose whether the stimulus appeared in the rst or second interval, using up (= rst) or down (= second) arrow buttons. After the response, text "Seuraava" (next one) appeared, indicating that the next pair of two intervals could be started by pressing the space bar. Each participant performed two task blocks, one in which the orientation of the stimulus had to be discriminated and one in which the color had to be discriminated, each preceded by a practice block of 16 pairs of two intervals. A critical stimulus block involved 80 masked interval pairs (24 ms SOA), intermixed in random presentation order with 8 control pairs in which the SOA was 129 ms, and thus the stimulus was more clearly visible. The practice block also included of 4 interval pairs with the 129 SOA.
The order of the orientation and color task blocks was counterbalanced across participants. Half of the participants made the orientation block rst, followed by the color block, whereas the other half made them in the reversed order. The computer randomized the orientation and color of the target, the mask version, and whether the target was presented in the rst or second interval, in such way that each of the target-present intervals, orientation, color, and mask version was presented equally often in each condition.

Data analyses
The results were analysed following the preregistered analysis plan with R statistical software 3.5.0 [20].
All statistical tests were two-tailed and used alpha level of .05. The accuracy rates in 2-interval forcedchoice (2IFC) responses were compared between orientation and color conditions with paired sample ttest in the masked trials, whereas no statistical tests were needed in the control trials as performance was at ceiling.
The accuracy of discriminating orientation and color was analysed using R [20]packages lme4 [21], sjPlot 2.4.1. [22], and Psycho 0.4.0 [23]. The analysis of accuracy in the masked condition (i.e., 24 ms SOA) was conducted on single trials with generalized linear mixed-effect logistic models (link = logit), using 'glmer' function in R-package lme4 to test the hypothesis that unconscious guided responses in discriminating the orientation or color occurs at higher than chance level when the interval is discriminated incorrectly. Accuracy in discrimination of orientation/color was the dependent variable. The xed effects were 2IFC response (incorrect vs. correct; i.e., whether the participant was 'objectively' conscious of the presence of the stimulus) and Feature (orientation vs color), and their interactions. The random effect was the random slope for 2IFC, as it produced a better t (AIC = 4539) with the data as compared with that of random intercept (AIC = 4543). The model was: glmer(accuracy ~ 2IFC*Feature + (2IFC|Participant), family = binomial). Fixed variables were scored as factors in such way that 2IFC-incorrect trials and orientation will be the reference categories (i.e., intercepts). Inferences are based on p-values (p < .05) and 95% con dence intervals (CIs). If the intercept is statistically signi cant and the 95% CI for intercept does not include chance level (note that log odds ratio of zero represents the chance level in glmer models), it means that accuracy is better than chance (50%) in the reference condition.
To con rm that the orientation and color discrimination performance in control trials was at high level as compared with that in the masked trials, we planned in the preregistration to run the generalized linear mixed effects model (accuracy ~ 2IFC * Feature + random effect) also on accuracy in the control trials. However, that model failed to converge (there were too few incorrect 2IFC responses; accuracy in orientation and color discrimination was at ceiling). Therefore, we analyzed discrimination accuracy in the control trials for correct 2IFC trials only.

Accuracy in 2-interval forced-choice decisions
We began the analyses with examining the ability of the participants to discriminate in which one of the two intervals the target appeared. Accuracy of 2IFC responses in the masked condition did not differ statistically signi cantly between orientation (M = .80, SD = .11) and color (M = .81, SD = .09) discrimination tasks, t(25) = -0.48, p = .637, d = 0.08, 95% CI [-0.06, 0.04]. Neither was there any difference between orientation and color conditions in the control trials in 2IFC decision accuracy: the performance was at ceiling for both orientation (M = .99, SD = .02) and color (M = .99, SD = .04). One of the participants made one error in the control trials in the orientation condition, and three participants made one error in the color condition. The high performance level in the control trials con rms that the participants were following the instructions and were able to detect the target when the target-mask SOA was long.

Discrimination of orientation and color
Next, we studied the accuracy rates in orientation and color discrimination in masked trials with generalized linear mixed effects model on single trials. Figure 2A  Orientation discrimination in correct 2IFC trials succeeded better than in incorrect ones (beta = 0.73, SE = 0.14, 95% CI [0.45, 1.02], z = 5.10, p < .001), and the lack of Feature x 2IFC interaction (beta = 0.26, SE = 0.17, 95% CI [-0.073, 0.60], z = 1.54, p = .124) suggests that color was, similarly as compared with orientation, discriminated better when the 2IFC decision was correct.
As non-planned control analysis, we run the previous analysis with excluding the four participants who made errors in 2IFC responses in control trials. The results (n = 22) replicated the nding that the intercept was higher than expected by chance (beta = 0.50, SE = 0.15, z = 3.283, p = 0.001, 95% CI [0.20, 0.81]), which means that in trials with incorrect 2IFC responses the probability of making correct orientation discrimination was higher than that of making an incorrect orientation discrimination. The color discrimination did not differ from that of orientation (beta = 0.035, SE = 0.16, z = 0.21, p = 0.831, 95% CI [-0.29, 0.36]. Performance was better in correct 2IFC responses than in incorrect ones (beta = 0.80, SE = 0.16, z = 5.05, p < .001, 95% CI [0.48, 1.12], and this effect did not interact with feature (beta = 0.04, 0.19, z = 0.226, p = .821, 95% CI [-0.33, 0.41]. Thus, the participants who performed perfectly the 2IFC choices in control trials showed in masked conditions above chance level performance in discriminating the features even when the 2IFC response was incorrect. Because of low number of 2IFC errors in the control trials, the discrimination accuracy was analysed only for the control trials in which the 2IFC decision was made correctly. Figure 3A shows the observed discrimination performance in control trials, whereas Fig. 3B displays the modelled results. For the controls trials, the overall model predicting discrimination accuracy (accuracy ~ Feature + (1|id)) had an explanatory power (conditional R2) of 7.50%, in which the xed effects' part was 20.87% (marginal R2). The model's intercept was at 2.95 (SE = 0.55, 95% CI [2.04, 4.42]). Within this model, the effect of Feature was signi cant (beta = 2.52, SE = 0.68, 95% CI [1.34, 4.07], z = 3.73, p < .001), indicating that color was discriminated better than orientation. However, Fig. 3A shows that for both features most of the participants performed at ceiling.

Comparison of accuracy between feature discrimination and 2IFC tasks
Finally, we compared accuracy of discriminating the feature (orientation or color) in all masked trials, without differentiating correct and incorrect 2IFC decisions, and accuracy in the 2IFC decisions in all masked trials (Fig. 4). A repeated-measures analysis of variance with Feature and Task (discrimination vs. 2IFC) as variables showed that the effect of Task was signi cant, F(1, 96) = 8.20, p = .005, partial omega-squared = 0.07, showing that accuracy in the 2IFC task (80%) was higher than in the feature discrimination task (74%). The effect of Feature (F(1, 96) = 1.70, p = .195, partial omega-squared = 0.01), and the interaction between Feature and Task (F(1, 96) = 1.12, p = .293, partial omega-squared = 0.001) were not statistically signi cant. These analyses suggest that the 2IFC task was easier than the feature discrimination task.

Discussion
Unconscious performance was studied with a bias free 2-interval forced-choice (2IFC) procedure. Instead of analysing the correlation between the proportion of correct discrimination responses and the proportion of correct 2IFC decisions [1], we focused on the accuracy of discriminating the orientation and color in those of the stimulus-present trials in which the observers made incorrectly the forced-choice decision concerning the presence of the target, indicating that they did not have any conscious detection of the target in the interval. The 2IFC can be considered as an objective task assessing conscious detection, because in the case of any kind of subjective experience of the stimulus, there should not have been any reason to hide the experience and intentionally to respond incorrectly. The results indicated that basic features of the target (orientation and color) could be discriminated better than expected by pure chance, in spite of the incorrect 2IFC decisions suggesting that the observers did not have subjective experience or knowledge about target's presence during the intervals.
The 2IFC performance in the control trials was at ceiling. This helps to rule out some potential counterarguments against the interpretation that the orientation and color discrimination decisions were guided by unconscious cognition. First, it might be that the observers did not take the instructions seriously and they made the IFC decisions randomly or without fully attending to the task, therefore some of the erroneous 2IFC decisions might have included consciousness. However, the high performance level in the control trials suggests that the observers took the instructions seriously and followed them carefully. Second, one might argue that the above chance-level feature discrimination performance could occur because the orientation discrimination choice required evaluation of only one interval at a time, but the 2IFC decision required evaluation of both intervals, which might have made the 2IFC decisions more demanding than the orientation decisions. Because the 2IFC decisions were made in the end of the sequence of two intervals, it might be that the observers forgot in which of the intervals the target appeared, or alternatively, the requirement of performing two different type of decisions in each sequence was otherwise too demanding; therefore the observers simply failed to make the 2IFC response correctly in some of the trials, although they were aware of the targets' appearance. This and other possible di culties in making 2IFC decisions do not seem likely as the observers could perform them perfectly in the control trials.
The above-chance feature discrimination performance without conscious detection of the target cannot be explained by a lower threshold in the feature discrimination tasks than in the 2IFC task, because the comparison of overall performance between the tasks indicated that the feature discrimination tasks was easier than the 2IFC task. This pattern stands in contrast to the pattern of results in Peters and Lau [1] who found that orientation discrimination was easier than the 2IFC task. The interpretation of data and modelling in their study was based on the idea that the feature discrimination task contains one source of uncertainty, while the 2IFC task contains two sources of uncertainty, and hence they expected that discrimination accuracy is higher than 2IFC accuracy. For this reason, the pattern of results (discrimination > 2IFC) was not interpreted as support for the unconscious processing hypothesis. In the present study, the feature discrimination accuracy in the critical masked trials, averaged across correct and incorrect 2IFC responses, was lower (74%) than the accuracy of performing the 2IFC decision (80%). Thus, deciding the interval in which the target appeared was actually an easier task than that of discriminating the orientation or color of the target.
In the present study we assessed the extent of unconscious performance using trial-by-trial measures and liminal stimuli instead of subliminal stimuli. The stimuli would have been subliminal, according to the objective threshold de nition of consciousness, if the 2IFC decisions would have been at chance level. The objective threshold approach has the risk that the stimulus is masked too far below the threshold so the one fails to detect any behavioral effects [24]. This kind of risk was avoided by using masks and stimuli [8] which stayed constant and did not result in too strong masking, and by the trial-by-trial approach. Measuring unconscious cognition in trial-by-trial manner with liminal stimuli requires that the stimuli are strong enough to exert detectable effects on performance, yet weak enough that observers in a su cient proportion of trials fail to become conscious of them. We do not know exactly in what proportion of the trials the participants actually were conscious of the stimulus, as the 80% accuracy level in 2IFC decisions could be contributed by both conscious and unconscious processing; by contrast, we can be more safe in concluding that at least in the remaining inaccurate 2IFC trials (20%) the observers were not aware of the stimulus. In the trial-by-trial approach the stimuli do not have to be unconscious in every trial. The trial-by-trial approach is possible, because the threshold of conscious detection uctuates due to the spontaneous rhythmical activity of neural populations in brain [25]. For example, the alpha power uctuates over time [26], which can be used to predict stimulus detection. The higher the prestimulus alpha power is, the less probably the stimulus is detected in trial-by-trial analysis [27]. Similarly, pre-stimulus alpha phase predicts conscious detection [28].
We conclude that unconsciously guided performance may occur, but perhaps only under speci c conditions. The issue is complicated due to several potential stimulus [12] and mask [8] variables whose effects should be systematically studied in further research. The limits of unconsciously guided behaviour could be studied further by manipulating the complexity of the stimuli and the level of processing required in discriminating them. In any case, the present demonstration of unconsciously guided behavior with the bias free 2IFC procedure can be considered as convincing evidence for its existence in humans.

Declarations Data availability
The dataset is available at OSF: https://osf.io/kf7ej/?view_only=207e93df443248bcac0a00a9f01b4905 (for reviewers; will be made public after acceptance). The sequence of stimulation in a 2-interval forced choice trial in the orientation condition. The participants made a left-right orientation decisions after both intervals and decided in the end of the trial whether the stimulus was present in the rst or second interval. The color condition was physically identical but the decision after each interval was made about the color (blue-green) of the stimulus. The discrimination of orientation and color in masked trials when the 2-interval forced-choice decisions was made incorrectly or correctly. A. The observed results. The un lled dots represent each participant's mean accuracy score, and the black dots show the whole group's mean accuracy. B. The modelled results. The error bars represent 95% con dence intervals.