Can humans become more random?

The human ability for random-sequence generation (RSG) is limited but improves in a competitive game environment with feedback. However, it remains unclear whether RSG during games can improve when explicitly informing people that they must be as random as possible to win the game nor is it known whether any such improvement transfers to outside the game environment. To investigate this, we designed a pre/post intervention paradigm around a rock-paper-scissors game followed by a questionnaire. During the game we manipulated participants’ level of awareness of the computer’s strategy; they were either (a) not informed of the computer’s algorithm or (b) explicitly informed that the computer used patterns in their choice history to beat them, so they must be maximally random to win. Using a novel comparison metric, our results demonstrate that human RSG can reach levels statistically indistinguishable from computer pseudo-random generators in a competitive-game setting. However, our results also suggest that human RSG cannot be further improved by explicitly informing participants that they need to be random to win. In addition, the higher RSG in the game setting does not transfer outside the game environment. Furthermore, we found that the underrepresentation of long repetitions of short patterns explains about a third of the variability in human RSG and discuss what might make up the remaining two thirds of RSG variability. Finally, we discuss our results in the context of the “Network-Modulation Model” and we ponder their potential relation to findings in the neuroscience of volition.


Introduction
Randomness-particularly random series generation (RSG)-is often used in neuroscience and psychology to increase cognitive load-e.g., as a distractor task from a main task [1][2][3] . However, in the neuroscience of volition for example, the ability to act randomly is interesting as an upper bound on arbitrary action. In other words, asking someone to make decisions as randomly as possible sets an upper limit on how arbitrary their action can be. This assertion follows from the assumption that arbitrary action, devoid of any reasons or other constraints, reflects the most distilled type of free choice [4][5][6][7] . While this view has been criticized [8][9][10][11] , it remains central in the field.
When instructed to carry out RSG, humans are able to generate series that are empirically equiprobable. However, humans typically fail to generate series that are sequentially independent 12,15 . To delve deeper into this concept, let us first define a run in a series as a sequence of the same entry (e.g., the series (0,0,0,1,1,0) is made up of 3 runs: the three consecutive 0's, the two consecutive 1's, and the final 0). Now, humans tend to produce systematically biased series, switching too often between entries, and thus underrepresenting long runs in their generated series 15,16 . It has nevertheless been found that, with detailed instructions and when they can see their generated series, human RSG becomes more sequentially independent and thus more random 15,16 . In particular, in a competitive environment with feedback-e.g., a matching-pennies game-humans exhibit higher sequential independence 13,16 . However, while it appears that humans can be more or less random in certain contexts, it remains unclear how any increase or decrease in randomness in humans compares to non-human objective benchmarks-e.g., how it compares to pseudorandom series generation (pseudo-RSG) by computer algorithms.
Here we aim to integrate three strands of research across multiple disciplines. One pertains to how human behavior is shaped by competitive contexts, also studied in economics and game theory 13,16 . The second relates to comparing human RSG to computer pseudo-RSG the measurement of random behavior in relation to a comparable random (but non-human) process. And the third contextualizes the implications human RSG has for understanding cognitive processes within multiple fields of psychology related to volition, decision-making, psychopathology, and the underlying brain structures that support these processes. In particular, we aim to answer the following questions. (1) Can human RSG in a game situation be improved when specifically informing people that they must be as random as possible to win the game? (2) Is human RSG in a game environment similarly random to the pseudo-RSG commonly used in modern programming languages? (3) Is the improved RSG exhibited in a game environment transferrable to a post-game environment? (4) To what extent are changes in RSG ability related to changes in the length of generated runs?

Participants
153 undergraduate students (112 female, age 20.7±2.2 years (mean±SD)) were recruited through the Psychology Department participant pool at the University of California, Los Angeles (UCLA). The study was approved by UCLA's North General Institutional Review Board (NGIRB; IRB#15-001094). All procedures were performed in accordance with the Declaration of Helsinki. Written informed consent was obtained from all participants, and all individuals received course credit for participation.

Experimental design and procedure
The experiment as a whole was designed as a 3 × 1 between-participant study-with the conditions being (1) Unaware, (2) Aware, and (3) Control. Each experimental session was divided into 3 parts, with 200 trials per part, separated into 2 blocks of 100 trials. All participants completed all 3 parts of the experiment. In each trial, participants selected Rock (R), Paper (P), or Scissors (S) by pressing the <G>, <H>, or <B> keys on the keyboard, respectively. In the first-pre-game-part of the experiment, participants were instructed to endogenously generate a sequence of R-P-S that was as random as possible. For the second-game-part of the experiment, participants were told they would be playing R-P-S with the computer as their opponent, and their goal was to win as many points as possible. During this game part of the experiment, the computer used a specific strategy to predict the participant's next move or not, depending on the experimental condition (see below for details). In the third and final-post-game-part of the experiment, the instructions from the baseline part were repeated. The progress of a trial in the experiment is depicted in Figure 1.
***** INSERT FIGURE 1 ABOUT HERE ***** The participant and the computer started the game with 100 points each. The winner was the player with more points at the end of the game. Each player gained or lost one point for every trial they won or lost, respectively. Ties left the score unchanged. Participants were further randomly assigned into one of 3 conditions during the game part of the experiment. All were instructed to do their best to win the game. However, participants in the Unaware and Control conditions were told nothing about the computer's strategy. Those in the Aware condition were explicitly informed that the computer would try to predict their moves and it was stressed that to win they must be as random as possible (Table 1). In the Unaware and Aware conditions, the computer sought out behavioral patterns in the participant's choice history (e.g., R-R-P-P-S-S-…), their win/lose/tie patterns (e.g., win-stay-lose-switch) and used those against them (for details on the algorithm used see Barraclough,et al. 17,Maoz,et al. 18 ). So, the participant's best strategy was to be as random as possible, minimizing the computer's ability to find patterns in their choice history. In the Control condition, the computer followed a pattern of R-P-S-R-P-S-… with 85% probability on any given trial, with a remaining 15% chance of choosing R, P, or S (with equiprobability) on any given trial. Participants were then given a post-experiment questionnaire asking them about the task, and their performance in the game.
Of our 153 participants 105 answered the post-experiment questionnaire. We asked them to rate how much they agreed with the following 7 statements, each on a Likert scale of 1 to 7 (1 = strongly disagree, 7 = strongly agree): 1) Whether they used a strategy against the computer.
2) Whether they felt that the computer was predicting their moves.
3) Whether they managed to create very random sequences in part 1 (before the game) and part 3 (after the game). 4) Whether the sequence they created in part 3 was more random than the one they created in part 1. 5) Whether they won most of the trials during the game. 6) Whether they tied on most trials during the game. 7) Whether they lost most of the trials during the game.

Participant-and trial-inclusion criteria
***** INSERT FIGURE 2 ABOUT HERE ***** In the post-experiment questionnaire, some participants reported frustration during the game part that resulted in loss of interest in the entire experiment, because they were generally losing and unable to beat the computer. We wanted to only analyze the data of participants who remained engaged with the experiment and continued to make an effort to win throughout the game part and further made an effort to be random during the RSG post-game part. We therefore included only participants who fulfilled both of the following inclusions criteria. (1) Deviation from an even (i.e., 1/3) R-P-S split was no more than 0.15 in any part of the experiment (  Of the 153 participants, 32 (20.9%) did not meet both these inclusion criteria and were excluded from further analysis. Of those, 21 of 61, 9 of 50, and 2 of 42 participants were excluded from the Unaware, Aware, and Control conditions, respectively. We expected the most participants to be excluded in the Unaware condition followed by the Aware condition, and least in the Control condition. This is because it was rather difficult to beat the computer (in Fig. 3 that would entail a difference >0), and participants in the Unaware condition had to learn for themselves how to beat the computer in the game part, while those in the Aware condition were told how to beat the computer. In contrast, in the Control condition, it was intentionally simple to beat the computer. Given the above, we accordingly assigned more participants to the Unaware condition during randomization. We therefore ended up with 40, 41, and 40 participants in the Unaware, Aware and Control conditions respectively. There were also two types of errors that could occur on each trial. (1) The participant did not press a key within 500 ms of the Go signal, and (2) the participant pressed a key other than <G>, <H>, or <B>. Both errors resulted in a forfeited trial. Further-in the game part-a point was awarded to the computer in such cases without a loss of a point for the participant. The first error occurred on 3.1±2.6% of the trials and the second on 0.5±0.8% of the trials (mean±STD). Those trials were removed from further analysis.

Measure of randomness
To quantify the randomness of a given sequence, we used a normalized Lempel-Ziv Complexity (LZC) score 19,20 , similarly to previous research 21 . We did not use any measures of randomness that directly rely on sequential dependence because, in the game part for the Unaware and Aware conditions, the computer algorithm relied on this measure to play against the participant. Thus, using a similar pattern-matching algorithm to test for randomness could amount to double-dipping, as participants were already evaluated against this measure by feedback from the algorithm in the game.
The LZC score was obtained by compressing each human generated sequence using the variable-rate compression algorithm by Ziv and Lempel 20 . We then normalized the LZC score by generating 1000 pseudorandom sequences of the same length as the human generated sequence, and divided the human generated sequence score by the mean LZC score of the pseudo-randomly generated sequences 22 . This was to remove the effect of small variations in sequence length on the raw LZC score, due to the removal of error trials, as a shorter sequence may naturally be less complex than a longer sequence due to its length. Thus, a normalized LZC score closer to 1 indicates that a sequence is more complex, and thus random, whereas a score closer to 0 indicates that a sequence is less random. For example, take a sequence [3 2 1 2 1 2 1 2 2 3] of length 10 as an example. After compressing it with the LZ78 algorithm, the sequence is stored as [3 2 1 257 269 2 3] (see Ziv and Lempel 20 for details). The length of this compressed sequence is therefore 7. To normalize it, we take 1000 pseudorandom sequences of length 10, so perhaps 1 = [1 2 1 1 3 1 2 2 2 1], 2 = [ 2 1 3 1 1 2 1 2 1 2], …, 1000 = [3 3 2 2 3 3 1 2 3 3]. Then the normalized LZ score is 7 ( ℎ( 78( ))) .

Permutation tests
To compare the randomness scores of the subjects to those generated by the computer pseudorandom number generators, we had to first understand the range of the pseudorandom normalized LZ scores. We therefore generated 1000 pseudo-random sequences of R-P-S of length 192 (the length of the average human sequence). Then, for each pseudorandom sequence separately, we ran the identical code that we used on the human data to compute their randomness scores. This gave us 1000 randomness scores to which we could compare the randomness score that we computed on the human data. The pseudo-random sequences were generated by Python's pseudo-random-number generators (using the Mersenne twister algorithm). We then took 5% of the randomness scores as the critical threshold (alpha level) for LZC scores (see below) to designate significant deviations from chance level. The bootstrapped 2.5 th to 97.5 th percentile of the normalized LZ score distribution was [0.973, 1.027] for a sequence length of 192. Thus, for a random sequence as random as Python's random-number generator, a sequence's normalized LZ score can be expected to fall between 0.973 and 1.027 with a 5% alpha level in a two-tailed test. Because the sequences were all generated empirically, the upper range of the normalized LZ score can exceed 1 if the numerator sequence's LZ score happened to be larger than the average mean of 1000 sequences other sequences, despite a theoretical upper limit of 1 for an LZ score.

Explanatory variable: average run length
We also computed an explanatory statistic called the run-length-the mean length of a run in a series. For instance, the run-length of the sequence R-R-R-P-P-S-P-P is the mean of [3,2,1,2], which is 2. The run-length is related to the randomness of a sequence in that the longer the run-length of a sequence is, the fewer runs there will be for a sequence of fixed length. We bootstrapped this measure, similarly to the randomness scores above, to obtain an empirical distribution of mean run-lengths. The 2.5 th to 97.5 th percentile range was [1.363, 1.658]. For a perfectly sequentially independent sequence of 3 items, the ideal run-length is 1.5.

R-P-S game results
Our manipulation in the game part worked well. Participants found it difficult to beat the computer in both the Aware and Unaware conditions, where the winning strategy required them to be random. On the other hand, participants in the Control condition, where the winning strategy was to almost always follow the simple P-S-R pattern, found it easy to win (Fig. 3). This is also evidenced by the fact that, after exclusion of participants who gave up on the task (see Methods), the average final scores for the participants in the Unaware, Aware, and Control conditions were 83.2±10.5, 83.5±12.3, and 143.1±49.7 (mean±STD), respectively. A final game score of over (under) 100 indicated that the participant won over (lost to) the computer. Overall, the experimental manipulation created a large effect 23  Similarly, after exclusions (see Methods), 3 out of 40 (7.3%), 4 out of 41 (10.3%), and 31 of 40 (77.5%) participants beat the computer in the Unaware, Aware, and Control conditions, respectively. Hence, instructing the participants that they needed to be as random as possible to beat the computer in the Aware condition resulted in no statistically significant increase in their ability to beat the computer (ANOVA with Welch correction for homogeneity, F(2, 118) = 43.32, p<0.001, 2 = 0.49; post-hoc t-test comparison with Bonferroni correction for Unaware vs Aware, t(80) = -0. 30 24 ) the model with condition as an effect versus the null model with no effects had a Bayes Factor (BF) of 6.32 x 10 14 , which is overwhelming evidence for the alternative model. A post-hoc comparison corrected for multiple testing between the Unaware and Aware condition showed a Bayes Factor of 0.24, which provides moderate evidence for the hypothesis that there was no difference in wins/losses between the two experimental conditions 25 . (The strong evidence for the alternative model in the main comparison was driven by comparisons between the Control and Experimental conditions. Post-hoc Bayesian t-tests for Unaware vs Control had a BF10 = 2.55 x 10 10 , and for Aware vs Control, BF10 = 2.94 x 10 9 ).

Randomness scores
We measured the randomness of the series that our subjects generated using LZC scores (see Methods). The distributions of the LZC scores for each condition and part of the experiment are shown as violin plots in Figure 4.
We separated the analyses for the experimental and control conditions because the control condition was specifically designed to elicit opposite effects (i.e., minimize randomness) during the game part. So, any effects we might find would be obscured and averaged out if pooled together across Unaware, Aware, and Control (see methods and Fig. 4). We therefore conducted two separate repeated measures ANOVAs with a between-subjects factor (condition: Unaware, Aware) and a within-subjects factor (experiment part: baseline, game, post-game) to compare the LZC scores across conditions and across experimental parts. The first ANOVA compared the experimental conditions (Unaware vs. Aware) across experimental parts, while the second looked only at the Control condition.
Our results suggest that for both the Aware and Unaware conditions, the subjects were significantly more random in the game condition compared to the baseline and postgame conditions, on average. At the same time, we found evidence that our subjects were similarly random in the Aware and Unaware conditions in the game part, again on average. For the Unaware vs Aware ANOVA, the LZC scores across the experiment . We ran an additional analysis, where we tested a Bayesian repeated-measures ANOVA model with the experiment part only as a factor versus the null model with no effects. We found that the model with the experiment part had the highest Bayes Factor, 2.99 x 10 8 , which is overwhelming evidence for the alternative model. The model with the experiment part and condition both as factors and the model with both factors including an interaction between the factors had Bayes Factors of 9.13 x 10 7 and 1.17 x 10 7 , respectively. Post-hoc comparisons corrected for multiple comparisons found that the Unaware vs. Aware condition had a Bayes Factor of 0.24, again supporting (with moderate evidence) the hypothesis that there was no difference in LZC scores across the experimental conditions. In order to further investigate the lack of significance between the pre-game and post-game LZC scores found earlier, we conducted another post-hoc comparison but found a Bayes Factor of 1.39, which leaves us with inconclusive evidence on whether pre-game and post-game LZC scores came from the same distribution or not.) We also tested LZC scores on a subject-by-subject basis. In the Unaware condition, 29 of the 40 participants, 72.5%, were more random (according to their LZC scores) during the game than during both pre-and post-game, which is significantly more than expected by chance (binomial test p < 0.001; chance level is 25%). Similarly, in the Aware condition, 23 of the 41 participants, 56.1%, were more random during the game than in both pre-and post-game, again more than expected by chance (binomial test p < 0.001). Hence, both the average and individual LZC scores suggest a pattern of subjects being more random in the game part compared to the baseline and post-game parts for the Aware and Unaware conditions. Importantly, we also found that during the game part the LZC scores in the Aware and Unaware conditions were as random as that of an equivalent computer-generated pseudo-random series according (permutation test- Fig. 4).
For the Control condition, subjects' scores versus those of the computer (Fig. 3), suggest that participants realized that the computer was generally using a consistent strategy of R-P-S, with rare deviations (see Methods). Hence, we found that-as per our design-subjects were less random on average during the game part . As for subjectby-subject results, 8 out of 40 were more random during the game than during both preand post-game (binomial test p=0.82) and 24 of 40 were less random during the game than during both pre-and post-game (binomial test p<0.001). So, taken together, these results suggest a pattern of subjects being less random in the game part compared to the baseline and post-game parts for the Control condition.

Run-length measure
Investigating further why participants' LZC scores varied in the above pattern, we looked at the length of the runs that the participants were creating. Thus, we further calculated and compared the average run-lengths of participants' sequences across all conditions (Fig. 5). This trend also held for individual participants. In the Unaware condition, 28 of the 40 participants, 70%, had longer runs during the game than during both pre-and postgame, which is significantly more than expected by chance (binomial test p < 0.001). Further, the run-length in the post-game part was longer than in the baseline (t(39) = -3.39, p = 0.01, 95% CI [-0.18, -0.01]). For the Aware condition, 24 of the 41 participants, 58.5%, had longer runs during the game than during both baseline and post-game, which is once more significantly more than expected by chance (binomial test p < 0.001). The run-length in the post-game part was however, not significantly longer than in the baseline (t(40) = -2.71, p = 0.11, 95% CI [-0.16, 0.008]).
Similar to the situation for the LZC score, in both experimental conditions, the mean runlength was not significantly different from that of an equivalent computer pseudo-RSG only during the game (permutation test- Figure 5).
In the control condition, no significant differences in run-length was found among the 3 parts of the task (repeated-measures ANOVA with Greenhouse-Geisser correction for sphericity, F(1.18, 46.05) = 0.61, p=0.46, 2 = 0.015). The analysis of individual participants did not result in any clearly reliable differences either: while more participants than expected by chance-17 of the 40 participants-had longer runs during the game than during both pre-and post-game (binomial test p = 0.012), there was also an almost significant number of participants that had shorter runs during the game compared to both pre-or post-game-15 out of 40 (binomial test p = 0.054). This helps explain why, on average, there were no differences between the 3 parts of the experiment.

Figure 5. Average run-length scores of human generated sequences. Average run-length scores by experimental part and condition are indicated below each violin plot and in black inside each violin plot with 95% CI. The horizontal red line indicates the mean run-length of the 2.5 th percentile of 1000 bootstrapped pseudo-random sequences.
We wanted to understand the extent to which the variance in randomness might be explained by the run length. We therefore regressed each sequence's average runslength onto its corresponding LZC score. We found a significant correlation between the two, with the run length explaining almost 3/10 of the variance in randomness across all experiment parts and conditions combined (R 2 =0.28, p < 0.001). We similarly found significant correlations between run length and LZC scores for each experiment part separately (R 2 =0.35, R 2 =0.33, R 2 =0.32, for baseline, game, and post-game, respectively; all p<0.001, all OLS regressions).

Questionnaire answers
As part of the post-experiment questionnaire, subjects reported whether they used a strategy against the computer in the game part of the experiment. We thus tested whether there was a difference between their responses in the Aware and Unaware subject groups. Subjects' mean ratings was 3.86±1.61, 4 for Unaware, 4±1.61, 4 for Aware and 5.21±2.02, 6 for Control (mean±STD, median). Hence, the Aware and Unaware conditions were both reliably different than the control but not from each other (ANOVA F(2, 99) = 6.38, p<0.001, 2 = 0.11. Post-hoc unpaired t-tests: Unaware vs Aware t(63) = -0.32, p=1; Unaware vs Control t(64) = -2.78, p=0.02; Aware vs Control t(71) = -3.28, p=0.004.) In a similar manner, we tested whether subjects' perceptions of being predicted by the computer differed between the Aware and Unaware condition. Subjects' average score was 4.06±1.93, 5 for Unaware; 4.62±1.80, 5 for Unaware; and 3.65±2.16, 4 for the Control group (mean±STD, median), which were not significantly different (ANOVA F(2, 99) = 1.96, p=0.15, 2 = 0.04).
We also tested whether participants' perception of their improved ability to generate randomness after the game correlated with their actual improvement in randomness as measured by their LZC score. The correlation between their answer to questionnaire question 4 (whether the sequence they created in part 3 was more random than the one they created in part 1) and the difference in LZC score between the post-game and baseline sequences was small and not statistically significant (Pearson's R = 0.12, p=0.22). Similarly, we tested the correlation between the LZC score difference and ratings of question 3 (whether they managed to create very random sequences in pregame and post-game). Again, the correlation was very small and not significant (Pearson's R = 0.02, p=0.76).

Discussion
It is known that humans are unable to generate highly random sequences 26,27 . However, previous literature has demonstrated that humans are more random in competitive situations with feedback 13,16 . Building on that, we set out to examine four research questions. First, we tested the extent to which human RSG in a competitive game situation depended on whether the subjects were specifically informed that they must be as random as possible to win the game. We did not find evidence that there was a difference in randomness when our subjects were aware versus unaware that they must be as random as possible. So, in other words, simply informing subject that they must be as random as possible did not make them more random. Other factors could have contributed to the lack of difference between the Unaware and Aware groups. For example, Hyman and Jenkin 28 have demonstrated for a similar task that subjects' motivation to succeed and their belief in their ability to succeed in the task contribute to their performance. So, our participants may not have been particularly motivated to win in the game context, as their performance was not related to any compensation, unlike in other studies 16 . Participants in the Aware and Unaware groups might have also believed that the computer's prediction algorithm works well and that they are unlikely to be able to beat it.
Interestingly, in the post-experiment questionnaire subjects did not report differences in the extent to which they thought that they had used a strategy against the computer between the Aware and Unaware conditions. Hence, being made aware that they need to be random to beat the computer did not make subjects more likely to use a conscious strategy against the computer than when unaware they needed to be random. Similarly, we found no difference in subjects' perceptions of how much the computer predicted them between the Aware and Unaware conditions. In both cases they felt slightly predicted. So, again, being made aware that the computer was predicting them did not make subjects feel more predicted by the computer. Furthermore, the post-experiment questionnaire confirmed that participants' perception of the randomness of their sequences had very little correlation with the actual randomness of those sequences.
Notwithstanding, our second research question was how random can people be in a competitive game environment? We found that human RSG in such an environment, at least when they had to be as random as possible to win, is on average statistically indistinguishable from pseudo-RSG commonly used in modern programming languages. The extent to which computer pseudo-RSG are random is debatable. (Python's default pseudo-random generator relies on the Mersenne twister algorithmalso the default in Matlab, Stata, Ruby, R, Julia, Scilab, SageMath, etc.-which passes the Diehard tests of randomness and the Small Crush battery of the newer TestU01 collection, but fails on the more extensive Crush and Big Crush battery of the TestU01 collection; [29][30][31] . However, our results are nevertheless interesting and set a high benchmark for the ability of humans to be random.
Third, we were wondering whether the improved RSG exhibited in a competitive game environment would transfer to a post-game environment-i.e., are people more random after the game than they were before the game? We did not find evidence for such a transfer of human RSG ability. Put differently, subjects were on average as random after the game as they were before the game. We also asked them explicitly about their RSG ability before and after the game. And we found that, at the group level at least, participants were generally not able to perceive when they were more or less random after the game. Thus, it appears that participants were able to transfer neither the implicit nor the explicit randomness that they exhibited during the game to their RSG after the game.
Last, it is known that humans underrepresent long runs of short sequences during RSG-i.e., that humans over alternate between items in the series-and this is one reason for the limited human RSG ability (Nickerson, 2002;Nickerson & Butler, 2009). We therefore wanted to test whether changes in RSG ability were related to changes in the length of generated runs. The trends we found for runs length generally reflected our level of randomness measure among the different experimental conditions. Importantly, we also found a direct correlation between the level of randomness and the average run length of the sequence among subjects. So, subjects that had longer runs were also more random. Our analysis showed that the run-length explained about one third of the variance in participants' degree of randomness. It would therefore be interesting to investigate what factors make up the remaining two-thirds of the variability in human randomness. For example, giving instructions in a more intuitive manner, or using a different method of eliciting random sequences may reveal different results. Wagenaar 27 hypothesized that the difference between verbal instruction and visual display of potential entries in the random sequence, and therefore the difference between an internally and externally represented entries, could affect random sequence generation. Ayton, et al. 32 further suggests that some of the observed non-randomness may come from instructional bias from the experiment, by giving explicitly prohibited examples of unlikely or unacceptable sequences (e.g., telling them that the sequence "ABC" or meaningful words are unlikely for random sequences of letters). More concretely, Beach and Swensson 33 showed that specific instructions about the gambler's fallacy to ignore run dependency does not help lengthen participants' run lengths. They suggest that the method of eliciting random sequences may instead be more important. That might explain the results of Peterson and Ulehla 34 , who found that people had a decreased run-length when the random sequence was generated by throwing dice, as compared to cards. People might expect shorter runs with a die than with cards.
Our results also relate to the "Network-Modulation Model" for randomness proposed by Jahanshahi and colleagues. The model suggests that the superior temporal cortex (STC) is involved in stereotyped responses (such as ordered counting) and that suppression of these habitual responses is necessary for RSG. Further, according to the model, this suppression is achieved via the left dorsolateral prefrontal cortex (DLPFC), which exerts an inhibitory influence on the STC. The model was constructed based on evidence from positron emission tomography, transcranial magnetic stimulation and lesion studies [35][36][37] . This model more recently gained support from electroencephalography studies as well 38,39 . However, if participants can upregulate their DLPFC to exert enough inhibitory influence on the STC during game trials, why did they not transfer this ability to the post-game trials of our experiment, which they knew were coming and while instructed to be as random as possible? One possibility is that something in the game environment triggers the DLPFC inhibitory influence, but that such upregulation of the DLPFC is outside of human voluntary, conscious control. If DLPFC activity could be brought under more voluntary control-e.g., via neurofeedback 40 -would up-or down-regulating left DLPFC activity then increase or decrease participants' level of randomness, respectively? Regardless of the above, it remains unclear what makes our participants more random in the competitive game environment compared to the post-game part. There are at least two differences between the game and post-game trials: the competitive nature and the feedback in the former. A future study could test whether post-game randomness changes after a non-competitive game with constant feedback on one's level of randomness (e.g., via a continuously updated runs-test score), potentially teasing apart the effect of feedback from competition. Competition and feedback could increase motivation, which might further explain the difference in randomness (e.g., through the mechanism of attention). So, it would be good to test whether introducing a monetary reward above a certain randomness threshold, for example, could improve randomness (Hyman and Jenkin 28 suggest it may). In addition, explicitly informing participants that randomness during the game part is higher than during the pre-game part might make them realize that they do better during the game and facilitate transferring of their increased randomness to the post-game part. What is more, a neuroimaging study could test whether the higher degree of randomness during the game part correlates with stronger inhibition of the STC by the DLPFC or whether other circuits also alter their activity, thereby increasing DLPFC activity or decreasing STC activity, in accordance with the Network-Modulation Model.
Recent results suggest that individual RSG ability could potentially be like a fingerprint-a unique biomarker of cognition [41][42][43] . RSG ability also appears to vary amongst different psychopathological populations [44][45][46] It appears, from our results, among others, that humans cannot easily and consciously modulate their randomness at will. This might explain the randomness-as-a-biomarker phenomenon, providing support for the use of RSG as a clinical diagnostic tool. For example, if RSG is driven by momentary neural noise, such noise patterns might be individually unique. It would be interesting to test whether these individual RSG patterns generalize from self-paced to competitive game environments.
Finally, human RSG appears related to volition. For example, our results that demonstrate that RSG in a competitive environment might be as random as a computer's pseudo-RNG mean that the upper limit on the arbitrariness of human action could be higher than previously realized. Also, previous work has demonstrated the existence of biases that may affect decisions especially when the decision alternatives are of similar value 9 . Those biases were found in the DLPFC, which has also been implicated in RNG (the Network-Modulation Model). It would therefore be interesting to investigate the extent to which biases in decision making and RNG are supported by the same neural mechanisms, which may involve the DLPFC.

Author Contributions
GM and UM initiated the project. GM collected some of the data. SMW collected the rest of the data and analyzed all the data. SMW and UM wrote the main manuscript text. All authors reviewed the manuscript.  Following the countdown, participants were required to press the appropriate key for R, P, or S within 500 ms of the onset of the Go signal; otherwise the trial was forfeit. Their selection was then presented on the screen. In game trials (on the right), their selection was accompanied by the computer's selection and by the game score.