Analysis.
In the analysis, we focused on the following situations: For a target, one person gave an opinion to another (hereafter referred to as the ‘Giver’ and ‘Receiver’, respectively). Figure 1a illustrates the definition of ‘helpfulness’ of the Giver’s opinion. The left side represents the following situation: a Giver has already experienced a target; thenceforward, s/he gives an opinion (for example, 70 in the figure) to a Receiver who has not experienced it thus far. The right side shows the results of opinion-giving. Here, the Receiver has also encountered the target and has formed his/her own judgement (that is, the Receiver’s own opinion). The upper row on the right side exemplifies the situations in which the Giver had a similar opinion (80) regarding the target. In this case, we assumed that the Giver’s opinion was ‘helpful’ since it accurately predicted the Receiver’s future satisfaction. Conversely, when the Receiver had an opinion different from the Giver (for example, 20 in the lower row), we supposed that the Giver’s opinion was relatively ‘Unhelpful’.
Note that this assumption is similar to those adopted in previous studies3,4,7,8,25. As mentioned in the Introduction section, we simulated the opinion-giving on a computer using the evaluation data in Studies 1 and 2. The simulations could eliminate the possibility that the Receiver’s opinion formation was influenced by receiving the Giver’s opinion.
For the analysis, we mainly used the theoretical framework of the wisdom-of-crowd on the matters of taste (particularly, Müller-Trede et al.’s3), which enabled us to quantitatively investigate the efficacy of the proposed method.
Figure 1b illustrates our analysis. A Giver and a Receiver were independently selected from the participants whose behavioural data were obtained. Subsequently, we examined the helpfulness of the Giver’s three opinions (that is, the Own, Estimated, and Blended opinions). As shown in Fig. 1b, we employed the mean squared error (MSE) as an index for the helpfulness of opinions. In particular, we computed the value of the squared difference between a Giver’s opinion and the Receiver’s Own opinion. That is, a smaller MSE value indicated that the Giver’s opinion was more helpful. We conducted this analysis across all stimuli.
Using the simulation procedure, we examined all the Giver-Receiver combinations: (i) For a Giver, all participants except the Giver became her/his receivers and computed the MSE; (ii) The average MSE across all Receivers was allocated to the Giver’s value; (iii) For all participants, including the Giver, we computed the MSE.
-----Figure 1 about here-----
Main results.
Figure 2 shows the results of the analysis. Notably, the Blended opinion recorded a lower MSE than the Own opinion across the two studies (Study 1: Wilcoxon signed-rank test, p < .001, Cliff’s delta = 0.37; Study 2: paired t-test, p < .001, Cohen’s d = 1.36). That is, using our method, a Giver could improve his/her opinions. Thus, our main hypothesis was supported.
As mentioned in the Introduction section, the two studies differed in the stimulus categories. Moreover, the results indicated that they differed in terms of the data structure, as shown in Table 1; in Study 1, the average rating values were below half (i.e., 50) across all opinions. However, in Study 2, they were approximately half of them. Additionally, as Fig. 3 shows, none of the opinions followed a normal distribution in Study 1, while in Study 2, they did (Kolmogorov-Smirnov test; Study 1: ps < .001; Study 2: ps > .1). Taken together, our method was effective across different categories and data structures.
Table 1
The data structure across the two studies. This table showed a 95% confidence interval (CI) regarding the rating values. We computed the 95% CI by bootstrapping. In Study 1, the average rating values for all types of opinions were below half (that is, 50), while in Study 2, they were around half.
| Study 1 | Study 2 |
Own | [21.05, 22.95] | [50.75, 53.53] |
Estimated | [35.00, 36.88] | [53.00, 55.05] |
Blended | [28.07, 29.83] | [51.99, 54.08] |
-----Figures 2 and 3 about here-----
How did our method work across different data structures? Fig. 4 represents typical examples of the results. In Study 1 (Fig. 4a), the rating values of the Own opinion focused on 0. However, there were certain number of large rating values (for example, 100). In these cases, MSE on Own became quite large. On the contrary, fewer number of ratings were 0 in the Blended opinion. Especially, the rating value of the Blended opinion tended to be constantly distributed between 0 and 75. Put simply, this resulted in the Blended opinion recording lower MSE than the Own. Subsequently, in Study 2 (Fig. 4b), the mean rating value of the Own opinion was similar to that of the Blended (especially around 50). However, the distribution of rating values on the Own opinion was quite larger than that on the Blended. That is, there were certain cases where the rating values between a Giver and a Receiver were largely different (for example, the Giver’s rating value was 0 and the Receiver’s rating value was 100 and vice versa). In this respect, there were relatively fewer cases in the Blended than in the Own opinion since the Blended distribution was small. Thus, our method worked across different data structures.
-----Figure 4 about here-----
Notably, the results also indicated that the Blended opinion had a significantly lower MSE than the Estimated opinion in Study 1 (p < .001; Cliff’s delta = 0.52). Although we did not find such an effect in Study 2 (p = .55, Cohen’s d = 0.18), the findings were also in favour of the Blended opinion; we counted the number of participants who had a lower MSE in the Estimated (Blended) opinion than in the Own opinion. As a result, we found that in the Blended, more participants had a lower MSE than in the Estimated (55 for Blended and 44 for Estimated out of 56 participants; Fisher’s exact test: p < .005). Combined with the results of Study 1, it can be stated that the Blended opinion improves the Giver’s opinion more effectively than the Estimated opinion.
How our method worked: Analysis by decomposing MSE
How could our method reduce the error in Own opinion? In this respect, it is eminent3,47,48 that the MSE was theoretically decomposed into different sources of error. By performing this decomposition, we could comprehensively observe how our method worked effectively. Several decompositions of the MSE have been suggested3,42,43, among which we adopted the one proposed by Müller-Trede et al.3; this is because the decomposition consisted of psychologically meaningful factors. The MSE decomposition of Müller‐Trede et al.3 has been represented as follows:
MSE = Bias + Variability bias + Linear correspondence (1)
Firstly, the bias represents the degree to which the rating value of a Giver is higher or lower than that of a Receiver. Simply put, it was larger when the mean rating value of a Giver was more different from that of a Receiver. Secondly, the variability bias indicates how different the variability of the Giver opinions was from the optimal degree of regression to the mean. Specifically, the variability refers to the standard deviation of the Giver’s rating values across stimuli. Concerning the regression to the mean, we multiplied the variability of the Receiver opinions using a Giver-Receiver’s correlation coefficient across the stimuli. Lastly, linear correspondence denotes the extent to which a Giver-Receiver’s correlation deviates from a linear relation (see mathematical description in ‘Methods’).
Table 2 shows the MSE decomposition’s results. Remarkably, the Blended opinion recorded lower values in the variability bias than the Own opinion across the two studies (95% CI). We did not find such results concerning bias and linear correspondence (for figures, see also Fig. S1 of the Supplementary Information). The results indicated that our method worked effectively, owing to the improvement of the regression to the mean. Notably, the results were in line with the findings of the wisdom-of-crowds for matters of taste (Müller-Trede et al.3). As mentioned in the Introduction section, this study extended the wisdom-of-crowd phenomenon to the matters of taste from fact. In particular, it demonstrated that a crowd’s opinion mainly decreased the variability bias from an individual one, resulting in the emerging wisdom-of-crowd effect. Considering this viewpoint, we can regard our method as exploiting the wisdom-of-crowd effect for the matters of taste at the within-person level. In the Discussion section, we shall discuss this research’s contribution to the wisdom-of-crowd literature.
In Study 1, the Blended opinion had a lower bias in addition to the variability bias than the Estimated opinion. The results indicated that the rating values of the Estimated opinion were consistently farther from those of the Own than those of the Blended opinion (particularly, higher; see also Table 1). In Study 2, such a result was not found.
Table 2
The results of the MSE decomposition (95% CI; in Study 1, all types of opinions did not follow a normal distribution, Kolmogorov–Smirnov test: ps < .001). It can be observed that the Blended opinion had a smaller variability bias than the Own opinion across the two studies. In addition, in Study 1, the Blended opinion had a smaller bias than the Estimated opinion.
Study 1 | Bias | Variability bias | Linear correspondence |
Own | [641.79, 727.88] | [351.34, 415.41] | [283.72, 300.39] |
Estimated | [787.48, 884.64] | [350.01, 407.73] | [275.79, 291.46] |
Blended | [609.26, 690.30] | [274.15, 315.90] | [270.67, 286.38] |
Study 2 | Bias | Variability bias | Linear correspondence |
Own | [118.91, 179.41] | [207.32, 303.39] | [449.15, 479.40] |
Estimated | [98.52, 124.97] | [86.37, 162.80] | [447.09, 472.86] |
Blended | [95.76, 126.20] | [88.53, 145.54] | [432.29, 454.59] |
Further analysis: when does our method works better (or worse).
For further analysis, we investigated the conditions under which our method worked better or worse. Particularly, we focused on two factors: individual differences and taste discrimination.
Individual differences. There are diverse types of taste among people3,25,49,50: Some have considerably different tastes from the general public, while others have ordinary ones. Here, we examined how individual differences concerning taste typicality influenced the efficacy of our method.
Figure 5 illustrates our analysis. (1) For the typicality of the taste of a Giver, we calculated the absolute distance between the Giver’s and all participants’ Own opinions (particularly, the averages of all participants; called ‘distance from average’), that is, a small value on the distance from average represents high typicality of the taste of a Giver. (2) Subsequently, we analysed the reduction in the MSE. It was computed by subtracting the case when the Giver’s opinion was a Blended opinion from when it was an Own opinion; the larger the reduction in the MSE, the better the performance of our method. (3) We conducted this analysis across all stimuli and examined the relationship between the distance from the average and the reduction of the MSE. In particular, we calculated the correlation coefficient between them.
The following procedure was the same as in the final paragraph of the ‘Analysis’ section: for a Giver, we performed this analysis across the Receivers (all participants except the Giver). Thenceforward, the averages of the reduction in the MSE were assigned to the Giver’s value. Finally, we conducted this procedure for all the participants.
Figure 6 shows the results of the analysis. Each plot indicates the Giver’s value. The x-axis represents distance from average while the y-axis represents the reduction in the MSE. Across the two studies, we found a significant positive relationship between them (Study 1: rho = 0.42, p < .001; Study 2: r = 0.67, p < .001). That is, for a Giver with an atypical taste, our method worked better (for further analysis, see Section S2 of the Supplementary Information).
-----Figures 5 and 6 about here-----
Taste discrimination. When assessing an item, we have different feelings concerning our evaluations. For some items, we can make distinctive judgments whether we like them or not (for example, pop and metal music); however, for other items, we can only make vague judgments (for example, ambient and experimental music). Previous studies1,3 have indicated that the distinctiveness of judgments (called ‘taste discrimination’) plays a critical role in opinion-giving. Notably, Müller-Trede et al.3 showed that taste discrimination affected the helpfulness of opinions in our context. They primarily provided a theoretical model and pointed out the effects of taste discrimination; briefly, they defined taste discrimination based on signal-to-noise ratios on judgments. Subsequently, they performed behavioural research and empirically confirmed its influences. Specifically, familiarity with the stimulus to which participants responded was used as an index of taste discrimination.
Therefore, we investigated the effect of taste discrimination on the effectiveness of our method. In this section, we only utilised the behavioural data from Study 2, in which the participants indicated their Own opinion and familiarity, identical to the study of Müller-Trede et al.3. In addition, we provided ‘Difficulty’ as a new index for taste discrimination: The participants also directly answered how challenging they found it to answer the Own opinion (see more details in ‘Methods’).
We conducted mixed effects analyses that included the ‘Reduction of the MSE’ as a dependent variable, while Familiarity, Difficulty, and the interaction term as independent variables (Table 3). Consequently, the impact of Difficulty was significant (F(1, 1123.7) = 48.32, p < .001). Further, we found no effect of Familiarity or interaction (Familiarity: F(1, 427.1) = 0.13, p = .72; the interaction term: F(1, 1311.3) = 0.90, p = .34). Thus, concerning our experimental settings, only Difficulty influenced the effectiveness of our method. The findings showed that the lower the Difficulty, the higher the efficiency of our method: reduction of the MSE = − 3.23 × Difficulty + intercept (= 288.21).
How did the lower difficulty enhance our method? To examine this issue, we performed additional mixed-effects analyses separately for the Own and the Blended opinions (Tables 4 and 5, respectively). These analyses used the same independent variables as our last analysis: Familiarity, Difficulty, and interaction term. However, this analysis included a different dependent variable: the MSE (not the reduction of the MSE).
The results showed that the effects of Difficulty were significant (ps < .001), and neither Familiarity nor the interactions were significant (all ps > .1) for the Own and the Blended opinions. Concerning Difficulty, the MSEs both in Own and Blended opinions had inverse relationships: as the Difficulty increased, the MSE decreased. Importantly, the Own opinion had larger slopes than the Blended: Own = − 6.48 × Difficulty + intercept (= 1034.49), and Blended = − 3.03 × Difficulty + intercept (= 739.58). That is, as the Difficulty increased, the Own opinion rapidly became an accurate prediction. Consequently, the merits of using our method were relatively low when people found it challenging to answer their own opinions.
Table 3
Independent variable | Statistics |
Difficulty | F(1, 1123.7) = 48.32, p < .001 |
Familiarity | F(1, 427.1) = 0.13, p = .72 |
Difficulty * Familiarity | F(1, 1311.3) = 0.90, p = .34 |
Table 4
Results of the additional GLMM for the Own opinion.
Independent variable | Statistics |
Difficulty | F(1, 1298.4) = 92.16, p < .001 |
Familiarity | F(1, 1211.0) = 1.28, p = .26 |
Difficulty * Familiarity | F(1, 1329.5) = 1.96, p = .16 |
Table 5
Results of the additional GLMM for the Blended opinion.
Independent variable | Statistics |
Difficulty | F(1, 1286.1) = 58.01, p < .001 |
Familiarity | F(1, 1245.0) = 0.16, p = .69 |
Difficulty * Familiarity | F(1, 1326.8) = 3.11, p = .078 |