Realtime user ratings as a strategy for combatting misinformation: An experimental study

doi:10.21203/rs.3.rs-1967510/v1

Download PDF

Article

Realtime user ratings as a strategy for combatting misinformation: An experimental study

https://doi.org/10.21203/rs.3.rs-1967510/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 28 Jan, 2023

Read the published version in Scientific Reports →

You are reading this latest preprint version

Fact-checking takes time. As a consequence, verdicts are usually reached after a message has started to go viral and interventions can have only limited effect. A new approach inspired by the scholarly debate and implemented in practice is to harness the wisdom of the crowd by enabling recipients of an online message to attach veracity assessments to it, with the intention to allow poor initial crowd reception to temper belief in and further spread of misinformation. We study this approach by letting 4,000 subjects in 80 experimental bipartisan communities sequentially rate the veracity of informational messages. We find that in well-mixed communities, the public display of earlier veracity ratings indeed enhances the correct classification of true and false messages by subsequent users. However, crowd intelligence backfires when false information is sequentially rated in ideologically segregated communities. This happens because early raters’ ideological bias, which is aligned with a message, influences later raters’ assessments away from the truth. These results suggest that network segregation poses an important problem for community misinformation detection systems that must be accounted for in the design of such systems.

Twenty percent of the time users spend consuming news on the four largest social media sites, they are looking at content linking to one of 98 websites that researchers, professional fact-checkers and journalists agree produce fake, deceptive, low-quality or hyperpartisan news¹. This figure excludes misinformation that is produced less systematically, e.g. by news sites that only occasionally err or deceive, or by users themselves. So how can the propagation of such information on social media be mitigated? Extant approaches include algorithmically aided misinformation detection^2–4, professional fact-checking with subsequent flagging or retraction^5,6, and crowdsourced veracity assessments^7–9. An issue with these approaches is speed. Professional and crowd-based fact-checking takes time and many algorithms cannot act instantly as they must first gather behavioral data. Until a verdict is reached so that a piece of false information can be flagged or retracted, potentially false information can spread unchecked.

An emergent approach in the scientific literature on misinformation^8,10−12 as well as in practice is to harness the wisdom of the crowd by enabling recipients of an online message on a social media platform to attach veracity assessments to it. This may allow poor initial crowd reception to temper belief in and further spread of misinformation. For example, on Twitter’s Birdwatch users can write notes and attach these to a Tweet, explaining why they believe it is or is not misleading. Other users can rate these notes or write additional notes in response.

The main challenge this approach must overcome is to somehow function in online environments where truth seeking is not the dominant driver of behavior, but rather personal convictions, e.g. of a political or ideological nature. Previous studies on misinformation have shown that sharing decisions regarding messages with a clear political leaning are primarily guided by users’ ideological congruence with the message and only little by perceived veracity^13–15. Nonetheless, previous work on the wisdom of the crowd shows that also when individuals have strong individual biases of an ideological or other nature, as long as the average individual’s assessment is better than random, the aggregation of judgments produces an accurate collective assessment. This work assumes that individuals in a crowd cast independent votes^16,17, and it suggests that while individual judgements may not be very accurate, their average often closely approximates the truth^7,18−20. Recent experimental studies further show that when individuals do not make true-or-false decisions independently but are influenced by the decisions of those who came before them – as they would be on social media – individuals’ accuracy further improves^21–25. This is so because as long as the average decision-maker is more accurate than random, prior decisions of others will tend to nudge decision-makers towards the truth. If a developing rating starts off with the majority of decisions being correct, this will influence subsequent users towards making the correct decision and thereby further improve the rating. It can of course happen that the same social influence dynamics facilitate the spread of a false belief, namely when the initial decisions are incorrect. Subsequent users may then be influenced to make incorrect decisions themselves, further solidifying the incorrect rating.

In previous studies on social influence and the wisdom of crowds^{19,22,26−28}, only chance could generate a large early majority favoring the wrong veracity verdict, because in these studies either all subjects first cast an independent vote and then could revise based on the first round results, or the order according to which subjects made decisions was random. In online social networks contexts, however, the order according to which subjects would cast veracity verdicts occurs along the path through which information disseminates. And herein lies the problem: Such an order is far from random. Online social networks, like most social networks, are homophilous^29,30, comprising communities of predominantly like-minded peers^31–33. The level of segregation rarely reaches the extremity implied by the terms ‘echo chambers’ or ‘filter bubbles’ but is nonetheless substantial^34–37. Different groups of online users have different levels of ability to identify misinformation^4,14,38,39, and this ability correlates with political biases² and demographic characteristics⁴⁰. Misinformation is often politically or ideologically charged, or intentionally designed to mislead only a specific part of the population^15,41, and it usually appears among and targets those clusters of users who are most susceptible to it. Hence it would then appear that ratings would have to be able to cope with misinformation initially being rated in communities of individuals who all tend to have the same biases and likely believe the misinformation or in bad faith misclassify it as true.

Our study explores real-time user ratings under such circumstances in a large-scale experiment with 2,000 liberal and 2,000 conservative subjects in 80 bipartisan communities (Fig. 1). We implement two scenarios in which ratings are broadcast immediately after launch: First, a scenario mimicking the development of a real-time rating in an ideologically integrated network marked by many cross-partisan ties and no clustering according to ideology. Second, we implement a scenario representing the typical rating sequence in an ideologically segregated network, with individuals whose ideology is aligned with the content of a message rating the message first and more critical individuals rating the message later. These scenarios represent ideal-types and maximize our treatment as extreme cases of a continuum along which more and less segregated real-world online communities are positioned^30,32. We further compare these scenarios with a control condition resembling the setup in which crowd-based ratings have been studied previously^7,9: namely, a scenario in which subjects rate messages independently and without information about the rating decisions of others.

The simulation model we now introduce predicts that when communities are ideologically integrated, broadcasting the rating will trigger a positive feedback loop that improves individuals' capacity to differentiate between true and false messages. This happens despite strong ideological bias for or against such messages. Similarly, when true information is rated in segregated communities, early ratings from individuals with an ideological bias in favor of the true message foster the development of a correct rating. However, broadcasting the rating backfires and reduces correct identification when false information is first rated exclusively by ideologically friendly users and only later by ideologically opposed individuals.

In our model, individuals 1 ≤ i ≤ n make a binary rating decision C_i with regards to an informational message m with veracity v = 1 if the message is true, or v = -1 if the message is false. Ratings are made sequentially. Individuals’ propensity to make a correct rating decision $Prob\left({C}_{i}=1\right)$ is given by the following logistic function:

$$Prob\left({C}_{i}=1\right)= {\left(1+ \frac{{d}_{i}}{{1-d}_{i} } {e}^{‒ s \times { r}_{i}}\right)}^{-1}$$

The propensity to correctly classify is negatively impacted by how difficult it is to correctly classify a certain message. This difficulty, d_i, is the probability of incorrectly classifying a message when this is done independently, in the absence of information from others (0 ≤ d_i ≤ 1). d_i takes on the value of d_align for ideologically aligned individuals and d_mis for misaligned individuals. The difficulty terms d_align and d_mis capture ideological bias stemming from cognitive mechanisms such as motivated reasoning^42,43 and confirmation bias⁴⁴: It is more difficult for aligned individuals to identify a false (aligned) message as false, but less difficult for misaligned individuals to identify a false (misaligned) message as false $(v=- 1\to {d}_{align}> {d}_{mis})$. Likewise, cognitive bias makes it less difficult for aligned individuals to find true information true, but more difficult for misaligned individuals $(v= 1\to {d}_{align}< {d}_{mis})$. Formally, ${d}_{align}= \stackrel{-}{d}-(b\times v)/2$ and ${d}_{mis}= \stackrel{-}{d}+(b\times v)/2$, where $\stackrel{-}{d}$ denotes the average level of difficulty in the population. As we use an equal number of aligned and misaligned individuals in each simulation as well as in the experiment we report on later, $\stackrel{-}{d}= \frac{({d}_{align} + {d}_{mis})}{2}$.The term b captures to what extent a message activates bias in individuals (0 ≤ b ≤ 1) and corresponds to the absolute difference in difficulty between aligned and unaligned individuals: b = | d_align – d_mis |. Individual i’s propensity to correctly rate a message further depends on the previous classification decisions of others through the rating r_i, which is the average of previous decisions (Eq. 2).

$${r}_{i, i > 1}=\frac{{\sum }_{j< i}{c}_{j}}{i ‒1}$$

r _i ranges from − 1 (= all prior user classifications were incorrect) to + 1 (= all prior classifications were correct). For the first individual, i = 1, r_i equals 0. s denotes the degree to which individuals are influenced by rating r_i. Assuming positive susceptibility to the rating (s > 0), $Prob\left({C}_{i}=1\right)$ monotonically increases with r_i.

We derive hypotheses through simulation of this model. Each simulation run starts with the first individual i = 1, making a first rating decision with Prob(C_i=1) = 1 $-$d_i in the absence of prior ratings. The decision of i factors into the rating signal of the next individual i + 1, r_i+1, influencing i + 1’s rating decision. The simulation stops after i = N has made their decision. We match population sizes of our simulations with those in the experiment (N = 50). Similar results are obtained for smaller and larger populations. Simulation runs are executed 10,000 times for each parameter combination of interest. The dependent variable is the fraction of correct rating decisions out of all rating decisions, computed as an average of fractions over many simulation runs. We choose a target value that reflects average performance rather than a group decision because real-time ratings do not intend to reflect a final verdict (such as a majority vote) but aim to improve raters’ information detection capabilities.

We investigate the interplay of rating order, message veracity and cognitive biases in two real-time rating scenarios in which ratings are broadcast immediately: In the segregated scenario, a message originates and spreads in the aligned cluster so that aligned individuals sequentially rate first. The message then reaches the misaligned cluster and misaligned individuals rate it until everyone in the population has made their decision. In the integrated scenario, aligned and misaligned individuals alternate in making ratings. These scenarios are compared with an independence scenario in which choice order is alternating as well but in which the rating is not broadcast so that individuals make choices without knowledge of others’ ratings (i.e., s = 0 implying $Prob\left({C}_{i}=1\right)=1-{d}_{i}$ ∀ i).

In the independence scenario, the fraction of correct rating decisions equals the inverse of the average level of difficulty in the population, i.e., 1 – $\stackrel{-}{d}$. In the integrated scenario, it is to be expected that more individuals will make correct rating decisions than in the independence scenario if $\stackrel{-}{d}$< 0.5 and fewer if $\stackrel{-}{d}$ > 0.5 (Fig. 2A, left). Namely, if $\stackrel{-}{d}$ < 0.5, the first individual is more likely to make a correct rather than a false rating decision. If the first individual makes a correct rating decision, they influence the following individual to make a correct decision themselves, which enhances the accuracy of the rating for the next individual, and so forth. A real-time rating triggers a positive feedback loop for $\stackrel{-}{d}$ < 0.5, where each subsequent i^th rating has a higher probability to be correct than the previous one (compare Fig. 2B, left). Individual biases cancel each other out in the alternating ratings of aligned and misaligned individuals. These theoretical expectations hold for true as well as false messages equally since we assume no systematic differences in information difficulty between true and false information. A negative feedback loop, or ‘backfiring’, on the opposite, is expected to be triggered for $\stackrel{-}{d}$ > 0.5 since individuals are more likely to make incorrect rather than correct decisions. We accordingly formulate Hypothesis 1:

H1: When it is not too difficult to classify a message correctly ( $\stackrel{-}{d}$ < 0.5), then individuals in integrated groups (with information about previous rating choices) classify true and false messages more often correctly than individuals in independent groups (without information about previous rating choices).´

In the segregated scenario, aligned individuals give ratings first. Since they align with the standpoint of a given message, they are more likely to correctly identify a true message as true. On the other hand, compared to misaligned individuals, they have greater difficulty identifying a false message as false. Since aligned individuals are the ones to rate first, their decisions will determine the early accuracy of the rating signal and influence later raters. If messages are true and difficulty among aligned individuals d_align is below 0.5, the rating is likely to enter a positive feedback loop. Later misaligned raters – although less likely to make correct rating decisions due to their bias ‒ will make a correct decision more often than those raters without exposure to a rating signal (Fig. 2A and 2B, center). If messages are false and d_align is instead above 0.5, early raters are likely to make incorrect rating decisions and the rating is expected to backfire, resulting in a lower fraction of correct ratings compared to independent groups (Fig. 2A, right). This happens even if the average difficulty across all individuals is below 0.5.

H2: When it is not too difficult for ideologically aligned individuals to classify a message correctly (d _align < 0.5), then individuals in segregated groups (with information about previous rating choices) classify true messages more often correctly than individuals in independent groups (without information about previous rating choices).

H3: When it is difficult for ideologically aligned individuals to classify a message correctly (d _align > 0.5), then individuals in segregated groups (with information about previous rating choices) classify false messages less often correctly than individuals in independent groups (without information about previous rating choices).

The right side of Fig. 2B illustrates the backfiring of the rating signal when a false message originates in an ideologically aligned cluster, showing how the fraction of correct decisions by an agent’s position in a sequence (averaged over 10,000 simulation runs) is strictly lower in the segregated scenario than in the independent scenario. This can be attributed to the negative feedback loop that is likely to occur when a false message with high aligned difficulty (d_align) accumulates an increasingly incorrect rating signal. The right side of Fig. 2B also shows how the fraction of correct decisions among misaligned individuals increases in i’s position. Because for false messages d_mis < d_align, a rating will recover to some extent among misaligned individuals. Thus, we expect the following dynamics:

H4: When it is difficult for ideologically aligned individuals to classify a message correctly (d _align > 0.5), then in segregated groups (with information about previous rating choices) classification accuracy first gradually deteriorates among aligned individuals (H4a) and then gradually improves among misaligned individuals (H4b).

We tested our hypotheses by letting 80 groups of 50 participants sequentially rate true and false informational messages in an online experiment (N = 4,000 participants with a total of 80,000 decisions). An equal number of self-reported conservative or liberal subjects rated informational messages that clearly supported either of the two ideological viewpoints. In doing so, we ensured that participants had systematic cognitive biases in favor of or against certain messages. Subjects were recruited form Amazon Mechanical Turk and Prolific. We implemented the three conditions studied in the simulations: a segregated condition (20 groups starting with liberals and 20 groups starting with conservatives), an integrated condition (20 groups), and an independent condition (another 20 groups). Each rating group featured 25 liberal and 25 conservative subjects. Each subject answered to the same set of 20 messages, totaling 1,000 rating decisions per group. See Fig. 1, Methods, and Supplementary Information for details. Unless indicated otherwise, test results are derived from two-sided randomization tests. We test hypotheses separately for liberal and for conservative messages as to ensure a homogenous message sample in each step of the analysis.

Integrated groups

Consistent with Hypothesis 1, broadcasting the rating in ideologically integrated sequences improved classification accuracy: The fraction of correct rating decisions in integrated sequences was higher than in independent sequences, both when liberal (integrated 68.1% vs. independent 65.2%; ATE = 2.9%, p < 0.011, N = 40) and conservative message were rated (integrated 68.8% vs. independent 63.0%; ATE = 5.7%, p < 0.001, N = 40). Figure 3A illustrates this finding. Importantly, the fraction of correct rating decisions was higher irrespective of whether subjects rated true messages (integrated 71.6% vs. independent 66.7%; ATE = 4.9%, p < 0.001, N = 40) or false messages (integrated 65.0% vs. independent 61.1%; ATE = 3.9%, p = 0.049, N = 40).

Segregated groups

Supporting Hypothesis 2, subjects were more often accurate if a true message was being rated and those who aligned with the connotation of the message were to do ratings first (Fig. 3B). In segregated sequences where liberals rated liberal-leaning true messages first, the overall fraction of correct rating decisions rose by 6.2 percentage points as compared to independent sequences rating the same messages (independent 69.5% versus liberal-first 75.7.%; ATE = 6.2%, p < 0.001, N = 40). Similarly, when conservatives rated conservative-leaning true messages first, the fraction of correct rating decisions increased by 9.0 percentage points (independent 72.9% versus conservative-first 62.9%; ATE = 9.0%, p < 0.001, N = 40).

The right side of Fig. 3B illustrates how broadcasting the rating in segregated sequences enhances rating performance: Compared to subjects in independent groups, the average propensity of a subject to make a correct rating decision in segregated groups increases after the first few ratings have been made, and then consistently stays above the average accuracy of subjects in independent groups. The large decreases in accuracy around the 25th individual in Panel B reflect that aligned subjects are likely to rate aligned true messages as true while misaligned subjects are more likely to rate them as false.

Our third hypothesis postulates that ratings backfire when aligned subjects rate a false message first. Comparisons of independent and segregated groups in Fig. 3C show that this was indeed the case. In segregated sequences where liberals rated liberal-leaning, false messages first, the fraction of correct rating decisions decreased from 50.1 percent in independent groups to 42.1 percent in liberal-first groups (ATE = 8.9%, p = 0.013, N = 40). In segregated sequences where conservatives rated conservative-leaning, false messages first, the fraction of correct ratings sunk from 60.6 percent in independent groups to 55.7 percent in conservative-first groups (ATE = 4.8%, p = 0.033, N = 40).

Backfiring becomes further visible in the right side of Fig. 3C: Real-time broadcasting of rating decisions decreases accuracy when aligned subjects make incorrect rating decisions in the beginning of a sequence, which influences subjects to make incorrect decisions later in the sequence. Only liberal subjects in conservative-first sequences had a similar probability to make correct ratings as their independent counterparts. This is likely because liberal subjects had very high independent propensities of classifying false conservative messages as false (~ 80%), which made them largely insensitive to an inaccurate rating signal.

Results of a multilevel logistic regression analysis do not show evidence for Hypothesis 4a and 4b (see details in Supplementary Table S3 ‘Regression results Hypothesis 4’). Aligned subjects’ likelihood to make a correct rating decision did not decrease relative to their position in a rating sequence, both among liberal aligned subjects (β = − .007, p = .53) and among conservative aligned subjects (β = − .001, p = .87). Neither did we find increasing tendencies for misaligned subjects to make correct rating decisions, irrespective whether they were liberal (β = − .002, p = .89) or conservative (β = .004, p = .70). While patterns are much clearer on the macro-level, there were likely too few observations per individual position in the rating sequence to recognize clear trends.

Our findings identify a sufficient degree of ideological mixing as a condition for the viability of real-time user ratings as an intervention against misinformation in online social network communities. In reasonably mixed bipartisan communities with only moderate homophily, such systems may succeed at tempering belief and reducing spread at the crucial early stage of propagation where previous approaches have not been able to successfully intervene. The availability of information on the veracity perceptions of previous others then allows individuals to more often correctly classify both true and false messages than in the absence of such information. While partisanship is often thought to amplify users’ belief in misinformation^2,41, this finding speaks to the resilience of well-mixed, balanced bipartisan communities⁴⁵. In ideologically segregated communities, by contrast, false information is more often incorrectly rated as true because systematically biased, early ratings mislead later decision-makers to make incorrect rating decisions as well. Such backfiring poses a challenge to the benefits of real-time ratings since many communities on online social networks are marked by substantial ideological segregation^29,31−33.

One avenue for future research is to explore how backfiring in cases where networks are strongly segregated may be effectively prevented. This could potentially be achieved by weighting ratings by user ideology, which would prevent them from becoming inaccurate when users’ ideological biases are to some extent correlated with rating order or when populations are ideologically unbalanced. Alternatively, broadcasting a rating could be paused in highly homogenous environments until the rating is composed of a more diverse user base – although this would come at the loss of potentially being able to warn users about false content early in the diffusion process. Yet another possibility would be to use the type of crowd-based user ratings that we studied here in combination with professional fact-checking. While user ratings are an intervention that becomes effective very early, fact-checking could be used later to verify the accuracy of a rating before a message goes truly viral. Additionally, by shifting users’ attention away from cognitive alignment and towards veracity, interventions could aim at ‘de-biasing’ users’ ratings in segregated communities to avoid backfiring¹³. Such measures may be especially important since the felt presence of like-minded others in echo chambers is known to shift individual behavior towards sharing information according to partisan identity rather than information veracity^14,30,32,46. By hiding other community members’ ideological identities, our experiment excluded such effects that stimulate corrupt individual behavior. At the same time, we did not financially incentivize correct veracity judgements. Had we provided incentives we would have deviated from online social media that do not provide them and we might have inhibited subjects’ natural inclination to respond to their ideological biases⁴⁷.

The availability of a rating system can only limit the spread of misinformation if it influences sharing behavior, which was not the focus of our experiment. However, users may tend to refrain from sharing information with a bad reputation because they do not want to risk misleading others⁸. Broadcasting ratings along with a message will also make visible who shares information that is likely false. This would make it easier for both network neighbors and the online platform to put users under scrutiny who repeatedly share information with a bad reputation. As a consequence, users might consider carefully if they want to share such information. Recent research suggests that positive social cues facilitate sharing of information more when it is true rather than false⁴⁸. Future research may investigate if, conversely, users also avoid sharing information when social cues are negative, and whether this occurs out of fear of backlash, or out of intrinsic hesitation to spread potential falsehoods. Future research may also explore the effect of incentivizing such reputational considerations on the functioning of user ratings as a mechanism against the spread of misinformation.

As with all crowd-based rating systems, our strategy can only work if online users are reasonably able to discern true content from false content most of the time. Research suggests that this is indeed the case^7,9,13,49. Of course, ratings could be thwarted off by users rating in intentionally malevolent ways. Such behavior is to be expected on online social networks, for example by social bots or online trolls. However, this only becomes a threat when malevolent behavior in one dominant ideological direction is concentrated among those who first rate a message, similar to ratings from ideologically friendly users in segregated networks. We conclude that on ideologically integrated platforms, real-time user ratings could be a promising intervention to identify misinformation early in its diffusion process and prevent users from believing in it. On highly segregated platforms, additional interventions such as weighting ratings by user ideology may be required to render ratings more effective.

The experiment was pre-registered on the Open Science Framework and approved by the Ethics Committee of the European University Institute, Florence. Data were collected between August 18 and December 31, 2021. All experiments and subsequent data handling were performed in accordance with relevant guidelines and regulations. We obtained informed consent from each participant prior to the experiment. The supplementary section ‘Recruitment of Experimental Subjects’ provides a detailed account of subject handling and consent procedures.

Informational Messages

We used 20 true and false informational messages with either a liberal or a conservative ideological connotation to be rated by the subjects. Subjects were instructed to ‘read 20 statements and click true if you believe a statement to be true and click false if you believe it to be false’. Within each experimental group, subject i had to complete rating all messages before subject i + 1 could start their task. True messages summarized the main finding of a scientific article published in a social science or general science journal after 2015 as to provide a ground truth (example: “Gender diversity in student teams measurably improves their productivity”). False messages incorporated proven falsehoods by summarizing the inverted central finding of a published scientific article (example: “Human-induced CO2 levels in the air have no measurable impact on the likelihood of wildfires in California”). We ensured through pretesting that messages were indeed perceived as liberal or conservative leaning. Messages had the length of a tweet (< 280 characters) as to resemble pieces of information on online social media. We chose a balanced message set of 5 liberal and false, 5 liberal and true, 5 conservative and false, and 5 conservative and true messages. See supplementary section ‘Message Selection’ for a detailed account of the message set.

Analytical Strategy

We calculate average difficulty $\stackrel{-}{d}$ (for each message) as the fraction of correct rating decisions by the total of all decisions in the independence condition. d_align is calculated as the fraction of correct rating decisions among those who align with the ideological connotation of a message by all rating decisions from this group. Since decisions are completely independent, values for $\stackrel{-}{d}$ and d_align do not have to be computed at the sequence level but are aggregated over all decisions in that condition. For each hypothesis, we then select those messages that fell into the respective scope of the hypothesis, determined by the values of $\stackrel{-}{d}$ and d_align (e.g., for the test of H2 we select only messages with d_align < 0.5). To test Hypotheses 1–3, we use a non-parametric permutation test with 100,000 permutations. Hypothesis 4, unlike the other hypotheses, concerns individual rather than group behavior and thus requires an individual-level test. We use a multilevel mixed-effects logit regression in which we regress individual rating decisions (correct vs. incorrect: 1 / 0) on the subjects’ position in the sequence (see Supplementary Table S3 ‘Regression results Hypothesis 4’). Note that $\stackrel{-}{d}$ and d_align are estimates rather than ‘true values’. In the supplementary section ‘Robustness of Findings’, we present analyses taking statistical uncertainty of $\stackrel{-}{d}$ and d_align into account and obtain similar results for H1, H2 and H4 at conventional significance levels.

Acknowledgments

We would like to thank Rense Corten, Milena Tsvetkova, Andreas Flache, and Michael Mäs for helpful feedback, and Casper Kaandorp for programming the platform for the experiment.

Funding

This research was conducted using ODISSEI, the Open Data Infrastructure for Social Science and Economic Innovations (https://ror.org/03m8v6t10).

Author contributions

Conceptualization: JS, VF, AVDR

Data collection: JS, VF

Method & analysis: JS, VF

Supervision: VF, AVDR

Writing—original draft: JS

Writing—review & editing: VF, AVDR

Competing interests: The authors declare no competing interests.

Data and materials availability: Preregistration, all data and code available at https://osf.io/p5byq/

Allen, J., Howland, B., Mobius, M., Rothschild, D. & Watts, D. J. Evaluating the fake news problem at the scale of the information ecosystem. Sci. Adv. 6, eaay3539 (2020).
Del Vicario, M., Quattrociocchi, W., Scala, A. & Zollo, F. Polarization and fake news: Early warning of potential misinformation targets. ACM Trans. Web 13, 1–22 (2019).
Guo, B., Ding, Y., Yao, L., Liang, Y. & Yu, Z. The future of misinformation detection: New perspectives and trends. Preprint at http://arxiv.org/abs/1909.03654 (2019).
Tacchini, E., Ballarin, G., Della Vedova, M. L., Moret, S. & de Alfaro, L. Some like it hoax: Automated fake news detection in social networks. Preprint at http://arxiv.org/abs/1704.07506 (2017).
Ecker, U. K., Lewandowsky, S. & Tang, D. T. Explicit warnings reduce but do not eliminate the continued influence of misinformation. Mem Cognit 38, 1087–1100 (2010).
Lewandowsky, S., Ecker, U. K., Seifert, C. M., Schwarz, N. & Cook, J. Misinformation and its correction: Continued influence and successful debiasing. Psychol. Sci. Public Interest 13, 106–131 (2012).
Allen, J., Arechar, A. A., Pennycook, G. & Rand, D. G. Scaling up fact-checking using the wisdom of crowds. Sci. Adv. 7, eabf4393.
Kim, A., Moravec, P. L. & Dennis, A. R. Combating fake news on social media with source ratings: The effects of user and expert reputation ratings. J Manag Inf Syst 36, 931–968 (2019).
Pennycook, G. & Rand, D. G. Fighting misinformation on social media using crowdsourced judgments of news source quality. PNAS 116, 2521–2526 (2019).
Allen, J., Martel, C. & Rand, D. G. Birds of a feather don’t fact-check each other: Partisanship and the evaluation of news in Twitter’s Birdwatch crowdsourced fact-checking program. CHI Conference on Human Factors in Computing Systems 1–19 (2022) doi:10.1145/3491102.3502040.
Pröllochs, N. Community-Based Fact-Checking on Twitter’s Birdwatch Platform. Preprint at https://doi.org/10.48550/arXiv.2104.07175 (2021).
Pretus, C. et al. The Misleading count: An identity-based intervention to mitigate the spread of partisan misinformation. Preprint at https://doi.org/10.31234/osf.io/7j26y (2022).
Pennycook, G. & Rand, D. G. Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning. Cognition 188, 39–50 (2019).
Scheufele, D. A. & Krause, N. M. Science audiences, misinformation, and fake news. PNAS 116, 7662–7669 (2019).
Vosoughi, S., Roy, D. & Aral, S. The spread of true and false news online. Science 359, 1146–1151 (2018).
Galton, F. Vox populi (the wisdom of crowds). Nature 75, 450–451 (1907).
Surowiecki, J. The wisdom of crowds: why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations. (Doubleday, 2004).
Baker, K. M. Condorcet, From Natural Philosophy to Social Mathematics. (University of Chicago Press, 1975).
Becker, J., Guilbeault, D. & Smith, N. The crowd classification problem: Social dynamics of binary choice accuracy. Preprint at http://arxiv.org/abs/2104.11300 (2021).
Condorcet, M. J. Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. vol. 252 (American Mathematical Soc., 1785).
Becker, J., Brackbill, D. & Centola, D. Network dynamics of social influence in the wisdom of crowds. PNAS 114, E5070–E5076 (2017).
Frey, V. & van de Rijt, A. Social Influence Undermines the Wisdom of the Crowd in Sequential Decision Making. Manage Sci 67, 4273–4286 (2021).
Friedkin, N. E. & Bullo, F. How truth wins in opinion dynamics along issue sequences. PNAS 114, 11380–11385 (2017).
Goeree, J. K., Palfrey, T. R., Rogers, B. W. & McKelvey, R. D. Self-correcting information cascades. Rev Econ Stud 74, 733–762 (2007).
Van de Rijt, A. Self-correcting dynamics in social influence processes. Am. J. Sociol. 124, 1468–1495 (2019).
Da, Z. & Xing Huang. Harnessing the Wisdom of Crowds. Manage Sci 66, 1847–1867 (2020).
Guilbeault, D., Becker, J. & Centola, D. Social learning and partisan bias in the interpretation of climate trends. PNAS 115, 9714–9719 (2018).
Lorenz, J., Rauhut, H., Schweitzer, F. & Helbing, D. How social influence can undermine the wisdom of crowd effect. PNAS 108, 9020–9025 (2011).
Bakshy, E., Messing, S. & Adamic, L. A. Exposure to ideologically diverse news and opinion on Facebook. Science 348, 1130–1132 (2015).
Barberá, P., Jost, J. T., Nagler, J., Tucker, J. A. & Bonneau, R. Tweeting from left to right: Is online political communication more than an echo chamber? Psychol Sci 26, 1531–1542 (2015).
Boutyline, A. & Willer, R. The social structure of political echo chambers: Variation in ideological homophily in online networks. Polit Psychol 38, 551–569 (2017).
Cinelli, M., Morales, G. D. F., Galeazzi, A., Quattrociocchi, W. & Starnini, M. The echo chamber effect on social media. PNAS 118, (2021).
Del Vicario, M. et al. The spreading of misinformation online. PNAS 113, 554–559 (2016).
Conover, M. et al. Political polarization on twitter. in Proceedings of the international aaai conference on web and social media vol. 5 89–96 (2011).
Eady, G., Nagler, J., Guess, A., Zilinsky, J. & Tucker, J. A. How many people live in political bubbles on social media? Evidence from linked survey and Twitter data. Sage Open 9, 2158244019832705 (2019).
Flaxman, S., Goel, S. & Rao, J. M. Filter bubbles, echo chambers, and online news consumption. Public Opin Q 80, 298–320 (2016).
Muise, D. et al. Quantifying partisan news diets in Web and TV audiences. Sci. Adv. 8, eabn0083 (2022).
Pennycook, G. & Rand, D. G. Who falls for fake news? The roles of bullshit receptivity, overclaiming, familiarity, and analytic thinking. J Pers 88, 185–200 (2020).
Borah, P. The moderating role of political ideology: Need for cognition, media locus of control, misinformation efficacy, and misperceptions about COVID-19. Int. J. Commun. Syst. 16, 26 (2022).
Guess, A., Nagler, J. & Tucker, J. Less than you think: Prevalence and predictors of fake news dissemination on Facebook. Sci. Adv. 5, eaau4586.
Lazer, D. M. J. et al. The science of fake news. Science 359, 1094–1096 (2018).
Haidt, J. The Righteous Mind: Why Good People Are Divided by Politics and Religion. (Knopf Doubleday Publishing Group, 2012).
Mercier, H. & Sperber, D. Why do humans reason? Arguments for an argumentative theory. Behav Brain Sci 34, 57–74; discussion 74–111 (2011).
Nickerson, R. S. Confirmation Bias: A Ubiquitous Phenomenon in Many Guises. Review of General Psychology 2, 175–220 (1998).
Shi, F., Teplitskiy, M., Duede, E. & Evans, J. A. The wisdom of polarized crowds. Nat Hum Behav 3, 329–336 (2019).
Jun, Y., Meng, R. & Johar, G. V. Perceived social presence reduces fact-checking. PNAS 114, 5976–5981 (2017).
Prior, M. et al. You cannot be serious: The impact of accuracy incentives on partisan bias in reports of economic perceptions. Quart J Polit Sci 10, 489–518 (2015).
Epstein, Z., Lin, H., Pennycook, G. & Rand, D. How many others have shared this? Experimentally investigating the effects of social cues on engagement, misinformation, and unpredictability on social media. Preprint at https://doi.org/10.48550/arXiv.2207.07562 (2022).
Bail, C. A. et al. Assessing the Russian Internet Research Agency’s impact on the political attitudes and behaviors of American Twitter users in late 2017. PNAS 117, 243–250 (2020).
American National Election Studies. 2020 Time Series Study. https://electionstudies.org/data-center/2020-time-series-study/ (2020).
Eyal, P., David, R., Andrew, G., Zak, E. & Ekaterina, D. Data quality of platforms and panels for online behavioral research. Behav Res 1–20 (2021) doi:10.3758/s13428-021-01694-3.
Peer, E., Brandimarte, L., Samat, S. & Acquisti, A. Beyond the Turk: Alternative platforms for crowdsourcing behavioral research. J Exp Soc Psychol 70, 153–163 (2017).

No competing interests reported.

scirepsupplementary.docx

Download PDF

Journal Publication

published 28 Jan, 2023

Read the published version in Scientific Reports →

Editorial decision: Major revision
20 Oct, 2022
Reviews received at journal
21 Sep, 2022
Reviewers agreed at journal
12 Sep, 2022
Reviewers invited by journal
12 Sep, 2022
Editor assigned by journal
12 Sep, 2022
Editor invited by journal
29 Aug, 2022
Submission checks completed at journal
29 Aug, 2022
First submitted to journal
16 Aug, 2022

You are reading this latest preprint version

Realtime user ratings as a strategy for combatting misinformation: An experimental study

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Results

Integrated groups

Segregated groups

Discussion

Methods

Informational Messages

Analytical Strategy

Declarations

References

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1