Twenty percent of the time users spend consuming news on the four largest social media sites, they are looking at content linking to one of 98 websites that researchers, professional fact-checkers and journalists agree produce fake, deceptive, low-quality or hyperpartisan news1. This figure excludes misinformation that is produced less systematically, e.g. by news sites that only occasionally err or deceive, or by users themselves. So how can the propagation of such information on social media be mitigated? Extant approaches include algorithmically aided misinformation detection2–4, professional fact-checking with subsequent flagging or retraction5,6, and crowdsourced veracity assessments7–9. An issue with these approaches is speed. Professional and crowd-based fact-checking takes time and many algorithms cannot act instantly as they must first gather behavioral data. Until a verdict is reached so that a piece of false information can be flagged or retracted, potentially false information can spread unchecked.
An emergent approach in the scientific literature on misinformation8,10−12 as well as in practice is to harness the wisdom of the crowd by enabling recipients of an online message on a social media platform to attach veracity assessments to it. This may allow poor initial crowd reception to temper belief in and further spread of misinformation. For example, on Twitter’s Birdwatch users can write notes and attach these to a Tweet, explaining why they believe it is or is not misleading. Other users can rate these notes or write additional notes in response.
The main challenge this approach must overcome is to somehow function in online environments where truth seeking is not the dominant driver of behavior, but rather personal convictions, e.g. of a political or ideological nature. Previous studies on misinformation have shown that sharing decisions regarding messages with a clear political leaning are primarily guided by users’ ideological congruence with the message and only little by perceived veracity13–15. Nonetheless, previous work on the wisdom of the crowd shows that also when individuals have strong individual biases of an ideological or other nature, as long as the average individual’s assessment is better than random, the aggregation of judgments produces an accurate collective assessment. This work assumes that individuals in a crowd cast independent votes16,17, and it suggests that while individual judgements may not be very accurate, their average often closely approximates the truth7,18−20. Recent experimental studies further show that when individuals do not make true-or-false decisions independently but are influenced by the decisions of those who came before them – as they would be on social media – individuals’ accuracy further improves21–25. This is so because as long as the average decision-maker is more accurate than random, prior decisions of others will tend to nudge decision-makers towards the truth. If a developing rating starts off with the majority of decisions being correct, this will influence subsequent users towards making the correct decision and thereby further improve the rating. It can of course happen that the same social influence dynamics facilitate the spread of a false belief, namely when the initial decisions are incorrect. Subsequent users may then be influenced to make incorrect decisions themselves, further solidifying the incorrect rating.
In previous studies on social influence and the wisdom of crowds19,22,26−28, only chance could generate a large early majority favoring the wrong veracity verdict, because in these studies either all subjects first cast an independent vote and then could revise based on the first round results, or the order according to which subjects made decisions was random. In online social networks contexts, however, the order according to which subjects would cast veracity verdicts occurs along the path through which information disseminates. And herein lies the problem: Such an order is far from random. Online social networks, like most social networks, are homophilous29,30, comprising communities of predominantly like-minded peers31–33. The level of segregation rarely reaches the extremity implied by the terms ‘echo chambers’ or ‘filter bubbles’ but is nonetheless substantial34–37. Different groups of online users have different levels of ability to identify misinformation4,14,38,39, and this ability correlates with political biases2 and demographic characteristics40. Misinformation is often politically or ideologically charged, or intentionally designed to mislead only a specific part of the population15,41, and it usually appears among and targets those clusters of users who are most susceptible to it. Hence it would then appear that ratings would have to be able to cope with misinformation initially being rated in communities of individuals who all tend to have the same biases and likely believe the misinformation or in bad faith misclassify it as true.
Our study explores real-time user ratings under such circumstances in a large-scale experiment with 2,000 liberal and 2,000 conservative subjects in 80 bipartisan communities (Fig. 1). We implement two scenarios in which ratings are broadcast immediately after launch: First, a scenario mimicking the development of a real-time rating in an ideologically integrated network marked by many cross-partisan ties and no clustering according to ideology. Second, we implement a scenario representing the typical rating sequence in an ideologically segregated network, with individuals whose ideology is aligned with the content of a message rating the message first and more critical individuals rating the message later. These scenarios represent ideal-types and maximize our treatment as extreme cases of a continuum along which more and less segregated real-world online communities are positioned30,32. We further compare these scenarios with a control condition resembling the setup in which crowd-based ratings have been studied previously7,9: namely, a scenario in which subjects rate messages independently and without information about the rating decisions of others.
The simulation model we now introduce predicts that when communities are ideologically integrated, broadcasting the rating will trigger a positive feedback loop that improves individuals' capacity to differentiate between true and false messages. This happens despite strong ideological bias for or against such messages. Similarly, when true information is rated in segregated communities, early ratings from individuals with an ideological bias in favor of the true message foster the development of a correct rating. However, broadcasting the rating backfires and reduces correct identification when false information is first rated exclusively by ideologically friendly users and only later by ideologically opposed individuals.
In our model, individuals 1 ≤ i ≤ n make a binary rating decision Ci with regards to an informational message m with veracity v = 1 if the message is true, or v = -1 if the message is false. Ratings are made sequentially. Individuals’ propensity to make a correct rating decision \(Prob\left({C}_{i}=1\right)\) is given by the following logistic function:
$$Prob\left({C}_{i}=1\right)= {\left(1+ \frac{{d}_{i}}{{1-d}_{i} } {e}^{‒ s \times { r}_{i}}\right)}^{-1}$$
1
The propensity to correctly classify is negatively impacted by how difficult it is to correctly classify a certain message. This difficulty, di, is the probability of incorrectly classifying a message when this is done independently, in the absence of information from others (0 ≤ di ≤ 1). di takes on the value of dalign for ideologically aligned individuals and dmis for misaligned individuals. The difficulty terms dalign and dmis capture ideological bias stemming from cognitive mechanisms such as motivated reasoning42,43 and confirmation bias44: It is more difficult for aligned individuals to identify a false (aligned) message as false, but less difficult for misaligned individuals to identify a false (misaligned) message as false \((v=- 1\to {d}_{align}> {d}_{mis})\). Likewise, cognitive bias makes it less difficult for aligned individuals to find true information true, but more difficult for misaligned individuals \((v= 1\to {d}_{align}< {d}_{mis})\). Formally, \({d}_{align}= \stackrel{-}{d}-(b\times v)/2\) and \({d}_{mis}= \stackrel{-}{d}+(b\times v)/2\), where \(\stackrel{-}{d}\) denotes the average level of difficulty in the population. As we use an equal number of aligned and misaligned individuals in each simulation as well as in the experiment we report on later, \(\stackrel{-}{d}= \frac{({d}_{align} + {d}_{mis})}{2}\).The term b captures to what extent a message activates bias in individuals (0 ≤ b ≤ 1) and corresponds to the absolute difference in difficulty between aligned and unaligned individuals: b = | dalign – dmis |. Individual i’s propensity to correctly rate a message further depends on the previous classification decisions of others through the rating ri, which is the average of previous decisions (Eq. 2).
$${r}_{i, i > 1}=\frac{{\sum }_{j< i}{c}_{j}}{i ‒1}$$
2
r i ranges from − 1 (= all prior user classifications were incorrect) to + 1 (= all prior classifications were correct). For the first individual, i = 1, ri equals 0. s denotes the degree to which individuals are influenced by rating ri. Assuming positive susceptibility to the rating (s > 0), \(Prob\left({C}_{i}=1\right)\) monotonically increases with ri.
We derive hypotheses through simulation of this model. Each simulation run starts with the first individual i = 1, making a first rating decision with Prob(Ci=1) = 1 \(-\)di in the absence of prior ratings. The decision of i factors into the rating signal of the next individual i + 1, ri+1, influencing i + 1’s rating decision. The simulation stops after i = N has made their decision. We match population sizes of our simulations with those in the experiment (N = 50). Similar results are obtained for smaller and larger populations. Simulation runs are executed 10,000 times for each parameter combination of interest. The dependent variable is the fraction of correct rating decisions out of all rating decisions, computed as an average of fractions over many simulation runs. We choose a target value that reflects average performance rather than a group decision because real-time ratings do not intend to reflect a final verdict (such as a majority vote) but aim to improve raters’ information detection capabilities.
We investigate the interplay of rating order, message veracity and cognitive biases in two real-time rating scenarios in which ratings are broadcast immediately: In the segregated scenario, a message originates and spreads in the aligned cluster so that aligned individuals sequentially rate first. The message then reaches the misaligned cluster and misaligned individuals rate it until everyone in the population has made their decision. In the integrated scenario, aligned and misaligned individuals alternate in making ratings. These scenarios are compared with an independence scenario in which choice order is alternating as well but in which the rating is not broadcast so that individuals make choices without knowledge of others’ ratings (i.e., s = 0 implying \(Prob\left({C}_{i}=1\right)=1-{d}_{i}\) ∀ i).
In the independence scenario, the fraction of correct rating decisions equals the inverse of the average level of difficulty in the population, i.e., 1 – \(\stackrel{-}{d}\). In the integrated scenario, it is to be expected that more individuals will make correct rating decisions than in the independence scenario if \(\stackrel{-}{d}\)< 0.5 and fewer if \(\stackrel{-}{d}\) > 0.5 (Fig. 2A, left). Namely, if \(\stackrel{-}{d}\) < 0.5, the first individual is more likely to make a correct rather than a false rating decision. If the first individual makes a correct rating decision, they influence the following individual to make a correct decision themselves, which enhances the accuracy of the rating for the next individual, and so forth. A real-time rating triggers a positive feedback loop for \(\stackrel{-}{d}\) < 0.5, where each subsequent ith rating has a higher probability to be correct than the previous one (compare Fig. 2B, left). Individual biases cancel each other out in the alternating ratings of aligned and misaligned individuals. These theoretical expectations hold for true as well as false messages equally since we assume no systematic differences in information difficulty between true and false information. A negative feedback loop, or ‘backfiring’, on the opposite, is expected to be triggered for \(\stackrel{-}{d}\) > 0.5 since individuals are more likely to make incorrect rather than correct decisions. We accordingly formulate Hypothesis 1:
H1: When it is not too difficult to classify a message correctly ( \(\stackrel{-}{d}\) < 0.5), then individuals in integrated groups (with information about previous rating choices) classify true and false messages more often correctly than individuals in independent groups (without information about previous rating choices).´
In the segregated scenario, aligned individuals give ratings first. Since they align with the standpoint of a given message, they are more likely to correctly identify a true message as true. On the other hand, compared to misaligned individuals, they have greater difficulty identifying a false message as false. Since aligned individuals are the ones to rate first, their decisions will determine the early accuracy of the rating signal and influence later raters. If messages are true and difficulty among aligned individuals dalign is below 0.5, the rating is likely to enter a positive feedback loop. Later misaligned raters – although less likely to make correct rating decisions due to their bias ‒ will make a correct decision more often than those raters without exposure to a rating signal (Fig. 2A and 2B, center). If messages are false and dalign is instead above 0.5, early raters are likely to make incorrect rating decisions and the rating is expected to backfire, resulting in a lower fraction of correct ratings compared to independent groups (Fig. 2A, right). This happens even if the average difficulty across all individuals is below 0.5.
H2: When it is not too difficult for ideologically aligned individuals to classify a message correctly (d align < 0.5), then individuals in segregated groups (with information about previous rating choices) classify true messages more often correctly than individuals in independent groups (without information about previous rating choices).
H3: When it is difficult for ideologically aligned individuals to classify a message correctly (d align > 0.5), then individuals in segregated groups (with information about previous rating choices) classify false messages less often correctly than individuals in independent groups (without information about previous rating choices).
The right side of Fig. 2B illustrates the backfiring of the rating signal when a false message originates in an ideologically aligned cluster, showing how the fraction of correct decisions by an agent’s position in a sequence (averaged over 10,000 simulation runs) is strictly lower in the segregated scenario than in the independent scenario. This can be attributed to the negative feedback loop that is likely to occur when a false message with high aligned difficulty (dalign) accumulates an increasingly incorrect rating signal. The right side of Fig. 2B also shows how the fraction of correct decisions among misaligned individuals increases in i’s position. Because for false messages dmis < dalign, a rating will recover to some extent among misaligned individuals. Thus, we expect the following dynamics:
H4: When it is difficult for ideologically aligned individuals to classify a message correctly (d align > 0.5), then in segregated groups (with information about previous rating choices) classification accuracy first gradually deteriorates among aligned individuals (H4a) and then gradually improves among misaligned individuals (H4b).