A machine learning approach to identify psychological factors driving vaccine hesitancy in high income countries

doi:10.21203/rs.3.rs-787170/v2

Download PDF

Research Article

A machine learning approach to identify psychological factors driving vaccine hesitancy in high income countries

https://doi.org/10.21203/rs.3.rs-787170/v2

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Understanding factors driving vaccine hesitancy is crucial to vaccination success. We surveyed adults (N = 2510) from February to March 2021 across five sites (Australia = 502, Germany = 516, Hong Kong = 445, UK = 512, USA = 535) using a cross-sectional design and stratified quota sampling for age, sex, and education. We assessed willingness to take a vaccine and a comprehensive set of putative predictors. Predictive power was analysed with regression analyses and machine learning algorithms. Only 57.4% of the participants indicated that they would definitely or probably get vaccinated. A parsimonious machine-learning model could identify vaccine hesitancy with high accuracy (i.e. 83% sensitivity and 82% specificity) using 10 variables only. The most relevant predictors were vaccination conspiracy beliefs, paranoid concerns related to the pandemic, COVID anxiety, high perceived risk of infection, low perceived social rank, lower age, and female gender. Campaigns seeking to increase vaccine uptake need to take mistrust as the main driver of vaccine hesitancy into account.

Health Economics & Outcomes Research

Psychology

Artificial Intelligence and Machine Learning

machine learning approach

psychological factors

vaccine hesitancy

As several COVID-19 vaccines are being rolled out, success of the vaccination will crucially depend on a sufficient proportion of the population accepting a vaccine. Numerous studies have already investigated putative vaccine acceptance by asking people whether they would be willing to accept a COVID-19 vaccine if it were offered to them. Vaccine willingness rates vary around 65–75% of the population in most of the surveyed countries¹. The few multi-national studies to date indicate considerable between country variance^2–4. Even within the group of high income countries, which are now in the process of offering vaccines to all their citizens, the acceptance rates have been found to vary, with UK citizens showing particularly high vaccine willingness, Germans being more hesitant^3,4 and particularly low rates in Hong Kong⁵. Overall, however, it is clear that fewer people are willing to take a vaccine than required for sufficient population immunity^6–8.

To better understand the factors driving this hesitancy, several studies have assessed the putative predictors of COVID-19 vaccination willingness versus hesitancy. Higher vaccine willingness was found to correlate with a higher COVID-19-risk-perception^9,10,2, whereas vaccine hesitancy correlated with vaccine safety and efficacy concerns^{2,4,5,11−13}. Sociodemographic variables associated with hesitancy were younger age, female gender, lower income, lower education, unemployment, and migrant status in many of the studies^{11,3,14,4,13,5,9,2}. Further predictors were extreme political views¹⁵, higher social media consumption^9,10,16, mistrust of the government, research, and the medical profession^3,10,11,16, general and COVID-19-specific conspiracy beliefs^9,11,16, and paranoid ideation¹⁶.

Thus, some of the driving factors of COVID-19 vaccine hesitancy identified so far (i.e., sociodemographic factors, risk perception, trust in vaccine-safety) match those found for vaccine hesitancy in general^17,18. Beyond those factors, the studies point to the relevance of factors indicative of a more fundamental mistrust, including mistrust of mainstream media and politics, conspiracy beliefs, and paranoid ideation.

However, we do not know how well these putative driving factors perform in predicting vaccine hesitancy, which factors are most relevant to an optimal prediction, or whether an optimal prediction in one country can be generalized to other countries. Identifying a globally stable algorithm to predict vaccine hesitancy based on a limited set of variables would provide an immensely helpful basis for targeted interventions to increase vaccine willingness. Thus, an important next step would be to probe for and optimize the prediction of vaccination willingness in a multi-national survey on the basis of variables identified as relevant so far. This could be done by using machine learning algorithms that are able to capture the complex relationships and interactions between variables¹⁹.

Also, given the relevance of mistrust, it seems promising to place more focus on this construct in relation to vaccine willingness. This could be done by including a more fine-grained assessment of mistrust related variables along with predictors of mistrust that have been identified in clinical research on paranoia. These include social marginalization and adversity (e.g. having a minority status or interpersonal traumatization), and negative generalized beliefs about oneself, other people, and one’s position in society²⁰.

The present comprehensive multi-national survey included representative samples from five high-income sites in the early phases of vaccine rollout, to 1) assess the prevalence of COVID-19 vaccine willingness across sites; 2) confirm that COVID-19 vaccination willingness is predicted by a) sociodemographic variables including those indicative of social marginalization, b) COVID-19 anxiety and risk perception, c) political views and types of information sources, d) specific mistrust (i.e., vaccine conspiracy beliefs, pandemic-related paranoid ideation), e) general mistrust (i.e., conspiracy mentality, general paranoid ideation), f) social adversity, and g) generalized beliefs about the self, others, and ones’ own social rank; and 3) use these variables to probe for the optimum prediction accuracy for vaccine hesitancy and explore the stability of the most promising predictive model across sites.

Sample characteristics. Sample characteristics for the full sample and the individual sites are presented in Table 1.

Table 1. Participant flow and sociodemographic details across samples.

	UK (n=512)	USA (n=535)	AU (n=502)	GE (n=516)	HK (n=445)	Total (n=2510)
Participant flow
Participants approached for the survey	2725	1,173	NA	824	1673	NA
Participants who fulfilled quota and eligibility criteria	985	536	NA	645	524	NA
Participants who completed surveys and passed attention checks	512	535	502	516	445	NA
Age, M yrs (SD)	41.91 (14.87)	47.65 (17.05)	44.75 (17.55)	42.00 (13.79)	39.64 (13.57)	43.32 (15.73)
Gender (%) Male Female Genderqueer TransMale/Female Other	47.1% 52.7% 0% 0% 0%	46.4% 52.7% 0.2% 0.4% 0.4%	48.2% 50.8% 0.2% 0.2% 0.4%	49.2% 50.0% 0.6% 0.2% 0%	43.1% 56.6% 0% 0.2% 0%	46.9% 52.5% 0.2% 0.1% 0.2%
Size of city
<100.000 people	36.1%	37.6%	19.3%	55.4%	0.9%	30.8%
up to 500.000 people	28.9%	19.6%	16.1%	20.2%	5.2%	18.4%
up to 1 million people	8.4%	10.1%	12.5%	10.1%	3.6%	9.1%
up to 5 million people	3.7%	8.8%	25.7%	11.2%	1.3%	10.3%
up to 10 million people	4.9%	5.6%	10.2%	0.6%	83.8%	19.2%
over to 10 million people	4.9%	4.7%	1.8%	0.6%	2.5%	2.9%
unknown	13.1%	13.6%	14.3%	1.9%	2.7%	9.3%
Education Primary Secondary or equivalent A-level or equivalent Bachelor Degree Masters Degree PhD or equivalent	0.4% 19.7% 38.3% 30.3% 9.4% 2.0%	5.2% 0.0% 34.4% 46.7% 11.0% 2.6%	0.8% 15.5% 49.2% 28.9% 4.6% 1.0%	0.4% 59.7% 12.8% 11.4% 14.5% 1.2%	2.5% 28.8% 18.2% 39.8% 10.1% 0.7%	1.9% 24.5% 30.8% 31.3% 10.0% 1.5%
Income Under £18,500 £18,500-£36,999 £37,000-£55,999 £56,000-£74,999 £75,000-£92,999 £93,000-£111,999 £112,000+	15.6% 39.8% 23.6% 11.5% 4.7% 2.1% 2.5%	26.7% 25% 16.1% 10.1% 6.9% 7.5% 7.7%	22.9% 27.1% 13.3% 13.3% 12.4% 7.4% 3.6%	20.9% 28.3% 23.4% 14.7% 6.2% 3.3% 3.1%	8.5% 22.2% 28.8% 11.7% 13.9% 8.3% 6.5%	19.3% 28.6% 20.8% 12.3% 8.6% 5.7% 4.7%
Employment Full time Part time Retired Unemployed (looking) Military Unemployed (not looking) Home keeper/carer Disabled Training/school	50.4% 20.7% 10.4% 4.9% 0.0% 2.0% 5.7% 1.6% 4.3%	40.9% 8.8% 0% 4.9% 0.0% 22.1% 9.2% 4.7% 8.2%	41.8% 13.9% 16.9% 7.4% 0.0% 2.8% 7.2% 6.0% 4.0%	50.2% 17.6% 8.7% 6.2% 0.2% 1.7% 4.5% 2.5% 8.3%	74.4% 9.7% 3.6% 1.6% 0.0% 0.7% 1.3% 0.0% 8.8%	50.9% 14.2% 7.9% 5.1% 0.4% 6.1% 5.7% 3.0% 6.7%
Migrant status	12.7%	5.4%	15.9%	7.0%	5.4%	9.3%
Minority status
Sexual orientation/identity	11.9%	9.9%	11.0%	10.3%	10.1%	10.6%
Ethnic minority/skin color	11.7%	10.1%	11.4%	5.6%	8.3%	9.4%
Minority religion/belief	8.6%	12.1%	11.8%	8.9%	9.2%	10.2%
Physical disability	9.0%	15.0%	16.3%	11.8%	8.8%	12.3%
Visible physical condition	13.1%	17.8%	16.5%	22.1%	23.8%	18.5%
Part of ≥1 minority	37.7%	41.5%	45.0%	39.7%	36.6%	40.2%
Mental health diagnosis	12.3%	22.4%	41.8%	20.0%	7.2%	21.0%

Prevalence of vaccination willingness. Table 2 shows vaccine willingness across sites. Only 57.4% of all participants indicated that they would definitely or probably get vaccinated. The distribution of the answers varied considerably between sites. In the USA and Germany, a bi-modal distribution of answers with peaks in definite willingness and definite rejection of the vaccine were found. In the UK and the Australian samples, by contrast, there were skewed distributions with most participants indicating definite willingness. Finally, most participants in the Hong Kong sample answered in the mid-category indicating possible willingness for vaccination, with few participants responding with definite acceptance or rejection. An ANOVA of vaccination willingness showed a significant effect of site (F(4,2505) = 59.65, p < 0.001, η²=0.087). Bonferroni-corrected post hoc comparisons indicated a higher mean willingness in the UK than in all other sites (USA: T = 8.61, p_corr<0.001, d = 0.533, Australia: T = 6.94, p_corr<0.001, d = 0.436, Germany: T = 9.14, p_corr<0.001, d = 0.570, Hong Kong: T = 18.11, p_corr<0.001, d = 1.173) and in Hong Kong a lower mean willingness than all other sites (USA: T=-6.68, p_corr<0.001, d=-0.429, Australia: T=-9.53, p_corr<0.001, d=-0.621, Germany: T=-6.89, p_corr<0.001, d=-0.445).

Table 2 Distribution of vaccination willingness across countries

	Definitely rejecting vaccination if offered		Probably rejecting vaccination if offered		Possibly taking vaccination if offered		Probably taking vaccination if offered		Definitely taking vaccination if offered
	N	%	n	%	n	%	n	%	n	%
UK	28	5.5	40	7.8	40	7.8	59	11.5	345	67.4
USA	114	21.3	52	9.7	61	11.4	64	12.0	244	45.6
Australia	58	11.6	67	13.3	55	11.0	110	21.9	212	42.2
Germany	83	16.1	63	12.2	88	17.1	79	15.3	203	39.3
Hong Kong	44	9.9	125	28.1	150	33.8	83	18.7	43	9.7
Total	327	13.0	347	13.8	394	15.7	395	15.7	1047	41.7

Prediction of vaccination willingness using regression.

As can be seen in Table 3 (left column, correlation), most variables showed significant correlations with vaccine willingness. The strongest associations were found for COVID anxiety (positive association), vaccine conspiracy beliefs, COVID conspiracy beliefs, and general conspiracy mentality (all negative associations).

Among cluster-specific linear regression models, the specific mistrust model yielded the largest proportion of explained variance (R²=0.419), followed by the general mistrust model (R²=0.118) and the extended socio-demographic model (R²=0.086). The remaining clusters showed minimal levels of explained variance (see Table 2 mid columns). The combined regression model with all variables explained 47.2% of the variance in vaccine willingness (see Table 2 right columns).

Table 3 Correlation and multifactorial linear regression analyses predicting vaccination willingness

				Correlation			Regressions per variable cluster			Regression with all variables
		n	r		p		β	T	p	β	T	p
	Socio-demographic data						n=2276; R²=0.086
Age		2510	0.194***			<0.001	0.206***	9.74	<0.001	0.135***	7.05	<0.001
Gender (dichotomized; 0=male, 1=female)^a		2510	-0.127***			<0.001	-0.090***	-4.42	<0.001	-0.068***	-4.02	<0.001
Size of city		2276	-0.106***			<0.001	-0.125***	-5.87	<0.001	-0.097***	-5.53	<0.001
Low education (0≥A-level vs. 1=primary/secondary)^b		2510	-0.100***			<0.001	-0.086***	-4.14	<0.001	-0.069***	-4.18	<0.001
Low Household Income (dichotomized, 1=”<18.500”)		2510	0.116***			<0.001	0.125***	5.59	<0.001	0.035	1.93	.053
Not working (due to unemployment/disability)		2510	-0.022			0.263	-0.052*	-2.42	0.016	-0.013	-0.78	0.433
Migrant status		2510	0.023			0.254	0.019	0.93	0.351	0.030	1.88	0.060
Minority group (0=no vs. 1=yes)		2510	0.007			0.742	0.086**	2.60	0.009	0.057*	2.20	0.028
Minority group member (number)		2510	-0.024			0.222	-0.044	-1.33	0.184	-0.027	-1.05	0.295
Current diagnosis of mental health problem		2510	0.009			0.663	-0.018	-0.85	0.397	-0.035*	-2.01	0.044
Risk perception							n=2503; R²=0.064
COVID anxiety		2510	0.230***			<0.001	0.183***	7.69	<0.001	0.057**	2.79	0.005
Close people have been infected		2510	0.111***			<0.001	0.080***	4.04	<0.001	0.040*	2.38	0.017
Perceived infection risk		2503	0.170***			<0.001	0.035	1.38	0.168	0.071**	3.32	0.001
Perceived consequences of infection		2510	0.151***			<0.001	0.041	1.71	0.088	-0.010	-0.52	0.604
Political views							n=2438; R²=0.031
Political views		2510	-0.098***			<0.001	-0.103***	-5.16	<0.001	-0.050**	-3.06	0.002
Primary information source		2438	-0.142***			<0.001	-0.149***	-7.45	<0.001	0.007	0.41	0.684
Specific Mistrust							n=2510; R²=0.419
Pandemic persecutory threat (PPS)		2510	0.022			0.280	0.249***	12.81	<0.001	0.177***	6.51	<0.001
Pandemic paranoid conspiracy (PPS)		2510	-0.388***			<0.001	0.169***	9.71	<0.001	0.140***	6.83	<0.001
Pandemic interpersonal mistrust (PPS)		2510	0.108***			<0.001	-0.090***	-3.61	<0.001	-0.084**	-3.08	0.002
General Pandemic Paranoia (PPS)		2510	-0.078***			<0.001	-	-	-	-	-	-
Vaccine conspiracy beliefs		2510	-0.564***			<0.001	-0.639***	-27.46	<0.001	-0.560***	-21.06	<0.001
General mistrust							n=2510; R²=0.118
Ideas of reference (RGPTS)		2510	-0.080***			<0.001	-0.011	-0.30	0.765	0.063	1.91	0.056
Paranoid ideation (RGPTS)		2510	-0.073***			<0.001	0.030	0.82	0.412	0.012	0.35	0.724
General conspiracy mentality (CMQ)		2510	-0.343***			<0.001	-0.348***	-17.77	<0.001	-0.008	-0.38	0.707
Social adversity							n=2507; R²=0.017
Traumatic emotional neglect		2510	-0.088***			<0.001	-0.030	-1.25	0.212	-0.006	-0.32	0.750
Traumatic psychological abuse		2510	-0.121***			<0.001	-0.116***	-4.37	<0.001	-0.035	-1.61	0.108
Traumatic physical abuse		2507	-0.042*			0.036	0.040	1.66	0.097	0.042*	2.17	0.030
Traumatic sexual abuse		2510	-0.063**			0.002	-0.023	-1.05	0.296	0.007	0.37	0.710
Generalized beliefs (self, others, own social rank)							n=2510; R²=0.050
Negative beliefs about self (BCSS)		2510	-0.099***			<0.001	-0.008	-0.284	0.777	-0.002	-0.08	0.938
Negative beliefs about others (BCSS)		2510	-0.148***			<0.001	-0.103***	-4.513	<0.001	-0.044*	-2.20	0.028
Positive beliefs about self (BCSS)		2510	0.079***			<0.001	-0.075*	-2.556	0.011	-0.034	-1.41	0.158
Positive beliefs about others (BCSS)		2510	0.191***			<0.001	0.196***	8.012	<0.001	0.105***	5.19	<0.001
Low perceived social rank (SCS)		2510	0.097***			<0.001	0.028	1.002	0.316	0.034	1.47	0.142
Full model explained variance										n=2215; R²=0.472

Note: a) to avoid bias due to low cell counts the variables sex and gender were combined into a dichotomized variable to reflect the gender a participants most likely reads as at present (e.g. a person describing their sex as male and their gender as trans/female was labeled as female; a person describing their sex as female and their gender as other was labeled as female) leading to a recoding for 16 participants (0.63%); b) education level was dichotomized with GCSE or lower categorized as low education level and everything else as high education level; PPS = Pandemic Paranoia Scale; CMQ = Conspiracy Mentality Questionnaire; RGTPS = Revised Green Paranoid Thoughts Scale; Scale; BCSS = Brief Core Schema Scales; SCS = Social Comparison Scale.

Prediction of vaccination willingness using machine learning.

For both cross-validation methods, a machine learning model revealed a prediction model with high balanced accuracy (see Table 4; details on the hyperparameter tuning results are provided in Supplement 2). As can be seen, the model was able to correctly classify 82% and 83% of the participants who were willing to get vaccinated (i.e., sensitivity) that were left out for cross validation in the cross-site and leave-one-person-out validation respectively. Furthermore, the model was able to correctly identify most participants who indicated an unwillingness to get vaccinated (i.e., specificity). However, the full model predicting new data from a separate site showed somewhat lower specificity (78%) than the leave-one-person-out model (82%).

Table 4 Cross-validation results for the machine-learning models predicting answers for the left out countries and in a leave-one-person out cross-validation

		Cross validation on left-out site						Cross validation of left-out person
Included/added variables		Sensitivity (yes)	PPV	Specificity (no)	NPV	BAC	TAC	Sensitivity (yes)	PPV	Specificity (no)	NPV	BAC	TAC
All variables included		0.82	0.89	0.78	0.67	0.80	0.81	0.83	0.91	0.82	0.69	0.83	0.82
Vaccination conspiracy belief excluded from model		0.77	0.84	0.69	0.58	0.73	0.74	0.80	0.86	0.72	0.63	0.76	0.78
Specific/General mistrust excluded from model		0.70	0.79	0.59	0.48	0.65	0.67	0.68	0.82	0.69	0.50	0.69	0.68
	Prediction using the most relevant variables
10 best variables		0.82	0.89	0.79	0.67	0.81	0.81	0.83	0.91	0.82	0.69	0.83	0.82
5 best variables		0.80	0.89	0.78	0.65	0.79	0.80	0.81	0.91	0.82	0.67	0.82	0.81

Note. PPV = positive predictive value, i.e. the frequency true positive tests among all positive tests; NPV = negative predictive value, i.e. the frequency true negative tests among all negative tests; BAC = balanced accuracy, i.e. the average of sensitivity and specifity; TAC = total unweighted.

Feature importance analysis showed that vaccination conspiracy beliefs was the most informative variable with a decrease in accuracy of 23.5% when this variable was permuted. The remaining nine of the top ten informative variables included all specific mistrust variables, social rank, COVID anxiety, perceived infection risk, age, and gender (see Table 4, left column). Calculating a new model without vaccine conspiracy beliefs, however, only resulted in a slight drop in accuracy with the specificity being more affected than the sensitivity. In this model, COVID specific and general conspiracy beliefs increased in feature importance, and positive beliefs in oneself and others moved up into the list of the top ten most relevant variables (see Table 4, mid column). Leaving out all specific and general mistrust variables led to a considerable drop in accuracy with the most informative variable in the model now being COVID anxiety followed by a mix of variables from all remaining clusters (see Table 4, right column). Subsequent models with further variable clusters removed performed only minimally less accurately (all ΔBAC ≤ 0.02).

Table 5 Variable importance for the five highest ranking variables across each model based permutation feature importance

Rank			Model
	All variables included		Vaccination conspiracy belief excluded from model		Specific/General mistrust excluded from model
	Variable name	Δacc	Variable name	Δacc	Variable name	Δacc
1	Vaccination conspiracy belief	0.235	Pandemic paranoid conspiracy	0.084	COVID anxiety	0.029
2	Pandemic persecutory threat	0.037	Pandemic persecutory threat	0.020	Age	0.013
3	Pandemic paranoia global	0.011	General conspiracy mentality	0.017	Positive others beliefs	0.006
4	Social rank	0.009	Pandemic interpersonal mistrust	0.015	Gender	0.005
5	COVID anxiety	0.007	COVID anxiety	0.014	Perceived infection risk	0.004
6	Age	0.005	Positive others beliefs	0.005	Primary information source	0.003
7	Pandemic interpersonal mistrust	0.005	Perceived infection risk	0.005	Negative others beliefs	0.002
8	Perceived infection risk	0.004	COVID paranoia global	0.005	Traumatic emotional neglect	0.002
9	Pandemic paranoid conspiracy	0.003	Age	0.004	Low education	0.001
10	Gender	0.002	Positive self-beliefs	0.002	Number of minority groups	0.000

Note. Δacc Values indicate the mean decrease in accuracy over ten permutations of the respective variable.

Calculation of a parsimonious models with only the ten or five most important variables from the full model (Table 5, left column) yielded the same accuracy as the full model (ten variables) or a minimal decrease in accuracy (five variables; Table 4).

Surprisingly, only 57.4% of the total sample indicated that they would definitely or probably get vaccinated, which is a somewhat lower percentage than the 65–75% identified previously^3,4. The lower rates might stem from the fact that vaccine side effects were receiving a lot of media attention during the assessment period²¹, and online misinformation on vaccination was rocketing²². However, differences between countries also need to be considered: Corresponding with the two previous multinational studies, we found the UK to show comparably high vaccine willingness and lower rates for the US and Germany^3,4. Interestingly, the distributions of the willingness scale also differed between sites, with the US and Germany tending more to the extremes (i.e., clear refusal or willingness), whereas participants from Hong Kong showed more indecisiveness. Low vaccine willingness in Hong Kong was also reported in a previous survey⁵ and may be partly explicable by safety and effectiveness concerns associated with specific vaccines being offered in Hong Kong.

In terms of predicting vaccine willingness, we could confirm most of the included factors that were delineated either from previous research or were added based on clinical models of paranoia. The clearest finding was the strong predictive value of specific mistrust, which explained 42% of the overall variance in vaccination willingness, exceeding previous associations between vaccine hesitancy and mistrust¹¹. Within the variables indicative of mistrust, the strongest associations with vaccine willingness were found for vaccine conspiracy beliefs and pandemic paranoid conspiracy beliefs. Interestingly, one type of mistrust within this group, namely not trusting others to comply with the COVID measures, predicted higher willingness to get vaccinated, suggesting that pandemic mistrust is a multi-faceted construct that leads to different behavioral responses. The next best predictors were variables indicative of more general mistrust, particularly general conspiracy mentality but also general paranoid ideation (about 12% of explained variance). In terms of demographics, we could confirm associations from prior studies (e.g. younger age, female gender, unemployment, living in larger city), except for migrant or minority status. In fact, the regression models found minority groups to have higher vaccination willingness. Together, the extended sociodemographic factors explained about 9% of the overall variance in vaccination willingness whereas each of the remaining categories of variables explained even less variance (2–6%).

Using machine learning, we were able to improve the prediction accuracy from the regression (47%) by finding a parsimonious model with a sensitivity of 82% and a specificity of 78–82%, depending on the type of cross-validation. This model confirmed the high predictive value of vaccine conspiracy beliefs and other indicators of specific mistrust, but also used the perception of social rank, COVID anxiety and perceived infection risk, as well as age and gender to optimize its prediction (see Table 4, left column). Despite the high relevance of the vaccine conspiracy beliefs, they were not essential to good prediction and could be compensated for by putting more weight on COVID specific and general conspiracy beliefs, resulting in almost as good prediction accuracy. In contrast, models that were not fed with any mistrust variables performed poorly.

There was no drastic drop in the models’ prediction accuracy depending on the method of cross-validation. Accurate test predictions were found both when we trained the model on all participants but one and tested it on the remaining participant and when we trained it on the four sites before testing it on the fifth site. This indicates that there is a large overlap in the crucial variables and their complex interactions in predicting willingness, irrespective of culture. Thus, the psychological mechanisms underlying willingness appear to be largely universal. Given the high sensitivity and specificity of our models, this implies that countries with lower vaccine willingness differ from those with higher willingness in the predictors identified as relevant. A straightforward conclusion would be that the most prominent predictor, vaccine conspiracy beliefs, must be higher in countries with lower vaccine willingness. Differences in vaccine and general conspiracy beliefs are indeed likely to explain a substantial part of the difference between the UK (highest willingness scores, lowest scores on all conspiracy scales, compare Supplement 4 for the mean values of all predictors by site) and Hong Kong (lowest willingness, high conspiracy beliefs) and the conclusion also fits in with the finding of extreme vaccine hesitancy in countries of the middle east with very high rates of conspiracy beliefs¹. Looking merely at the extent of the most relevant predictors would nevertheless be oversimplifying things. For instance, the Australian sample in our study had the highest score on the vaccine conspiracy scale, but an average score on vaccine willingness, which is only explicable by taking the entire list of predictors and their complex interactions into account (e.g., interactions between conspiracy beliefs and the positive predictor pandemic interpersonal mistrust regarding adherence to social distancing). It also needs noting that there was a small gain in specificity when the model was generated using data from the same site/culture, which may indicate that there are also regional factors that affect rejection, albeit only to a very small extent.

A limitation of the study is that although respondents were broadly representative of the sites’ adult general populations in terms of age, sex, and educational level, they are unlikely to be fully representative of the general populations. The percentages of those who declined participation varied across sites and the reasons for declining are unknown. Also, the sites do not reflect the global variability in cultures, thus the universality of predictors requires further validation in more heterogeneous samples of countries. Another limitation is the cross-sectional nature of the design. Although the causal interpretation that mistrust is driving vaccine refusal is intuitive, the opposite direction (vaccine conspiracy could be a post hoc rationalisation of not wanting a vaccine for other reasons) cannot be ruled out. Finally, it needs noting that vaccine willingness may not accurately predict actual vaccine uptake. However, the low willingness we found bears a striking resemblance to the (ongoing) vaccine uptake that is beginning to level off at rates below 60%²³. The extent to which the machine learning model predicting vaccine willingness holds up for predicting actual vaccine intake is also an issue for future research.

In sum, we found that by using only ten variables we were able to achieve an 82% accuracy in predicting vaccination hesitancy, with the most crucial factors being mistrust in vaccines, governments, companies, and organizations. The reasons for pronounced societal mistrust are manifold²⁴, but have been found to include both individual societal experiences, such as downward social mobility²⁵ and the perception of past and present institutional misperformance²⁶. Institutions that do not perform well, be it by incompetence or elite misbehaviour and corruption, tend to generate distrust²⁷. People are more likely to attend to and believe information that aligns with their expectations (confirmation bias)²⁸. Conspiracy theories align well with negative expectations that have resulted from previous experiences, rendering them more likely to be believed. The high predictive value of vaccine conspiracy beliefs clearly corroborates the efforts towards strategic approaches to detect and mitigate the impact of anti-vaccine activities on social media, to support high acceptance of safe and effective vaccines^22,29,30. However, given that our machine learning algorithm performed almost well in predicting vaccine hesitancy without vaccine conspiracy beliefs, based on more general mistrust, reducing or contradicting vaccine conspiracy information might not be sufficient. Publically provided vaccine information needs to take mistrust into account, perhaps by counterbalancing negative emotions with positive emotional appeals, such as altruism and hope³¹, and by providing information on the safety and effectiveness of the vaccine in a way that enables the recipients to judge its validity for themselves. Also, information campaigns will probably need to be complemented by policies aimed at regaining trust in politicians, industry, science, and the medical profession.

Design & Procedure

The design was a cross-sectional online-survey conducted in Hong Kong, Australia, USA, United Kingdom, and Germany. The survey was programmed using the online-survey platform Qualtrics. Participants were recruited using stratified quota sampling to ensure that each sample was representative of the respective general population based on sex, age, and educational attainment. No further eligibility criteria were applied. Data were collected between February and March 2021. We aimed for a sample size of 2500 taking into account the stratification and number of sites, the large number of predictors, and expected small effect sizes of some of the putative predictors. The survey took 25 minutes in total, beginning with informed consent, followed by socio-demographic assessment and the questionnaire battery, of which further details have been reported elsewhere³². To prevent missing data, participants were required to respond to all questions on each page before being able to continue leading to minimal missing data on isolated variables due to initial software errors (Missings for: “perceived infection risk”: 0.2%, n=7, “preferred sources of information”: 2.8%, n=72, and “social adversity”: 0.1%, n=3) or due to a “don’t know” answering option (“size of the current home city”: 9.3%, n=234). Participants who failed any of the attention checks, took shorter than half of the median completion time, or showed patterns of machine responses or duplicate patterns of response were excluded. The flow of participants across sites is shown in Table 1. All procedures were approved by each of the ethics committees of the institutions involved (i.e., (1) University of London Research Ethics Committee,. Reference No. 2368, (2) Care New England - Butler Hospital Institutional Review Board, Reference No. 202012-002, (3) La Trobe University Human Research Ethics Committee, Application No. HEC21012, (4) Local Ethics Committee, Universität Hamburg, Application No. 2020_346, and (5) The Chinese University of Hong Kong Survey and Behavioural Research Ethics Committee Reference No. SBRE-20-233).

Measures

Willingness to be vaccinated for COVID-19 was assessed with the following item: “If a COVID-19 vaccine was offered to you now, would you accept it?” The item was rated on a scale from 1= “Definitely not” to 5=“Yes, definitely’” adapted from Wong and colleagues³³.

Sociodemographic data and related questions: Sociodemographic variables included age, sex assigned at birth, and current gender (options: “male”, “female”, “trans-male”, “trans-female”, “genderqueer”, and “other”), size of the current home city (rated in six categories form ≤100.000 to ≥10.000.000), highest educational degree achieved (rated in nine categories from elementary school degree to PhD), annual income (seven categories from “under £18,500/US$24,999/18,000€” to “above £112,000/US$150,000/109,000€”), employment status over the last year (nine categories), migrant status, minority status (five categories, each rated as present or absent), and having a mental-health diagnosis.

Risk perception variables included (1) COVID-19 anxiety, (2) personal experiences with COVID-19 in family members or friends, (3) perceived infection risk, and (4) perceived consequences of an infection. Following Shevlin et al.³⁴ COVID-19 anxiety was assessed using the question “How anxious are you about the coronavirus COVID-19 pandemic?” for which participants were provided with a ‘slider’ to indicate their degree of anxiety with 0=“not at all worried” and 100=“very worried”. Personal experiences with COVID-19 in family members or friends were assessed by the following item: “Someone who is close to me has had a COVID-19 virus infection confirmed by a doctor” rated with 1=“yes” 0=“no”. Perceived risk of a COVID-19 infection was assessed with the item: “What do you think is your personal percentage risk of being infected with the COVID-19 virus over the following time periods?” rated from 1=“no risk” to 11=“great risk” for each time period (“the next month”, “the next three months”, and “the next six months”). Similarly, perceived consequence of infection was assessed with “How bad do you think would be the consequences of you being infected with the COVID-19 virus over the following time periods?” rated from 1=“not too bad” to 11=“very bad”. Mean scores of perceived risk and perceived consequences were calculated.

Political views were rated from 1=”very left-wing” to 7=”very right wing” and preferred sources of information (“How do you find out about what is going on in the world?”) were rated from 1=“always from mainstream media” to 5=“always from social media”⁹.

Specific mistrust variables included (1) COVID-specific paranoid ideation and (2) vaccine conspiracy beliefs. COVID-specific paranoid ideation was assessed with the Pandemic Paranoia Scale³², a 25-item scale assessing paranoid thinking specifically related to the COVID-19 pandemic. It comprises a COVID paranoia global score and the three facets pandemic persecutory threat (15 items, e.g.: “People are deliberately trying to pass COVID-19 to me”), pandemic paranoid conspiracy (six items, e.g.: “COVID-19 is a conspiracy by powerful people”), and pandemic interpersonal mistrust regarding health measures (four items, e.g.: “I can’t trust others to stick to the social distancing rules”). Participants answer on a scale from 0=“not at all” to 4=“totally”. Based on the data used for this article, Kingston et al.³² reported good reliability (internal consistency: α=0.90, test-retest reliability: 0.60≤r≤0.78), factorial validity, and criterion validity. For this study, the three subscales and the global score were calculated. Vaccine conspiracy beliefs were assessed by adapting the general 7-item Vaccine Conspiracy Beliefs Scale³⁵, a valid one-dimensional scale with high internal consistency. The adaptation involved referring to COVID-19 vaccines specifically and using present tense (full item-list in Supplement 1). Reliability in this study was α=0.97.

General mistrust variables included paranoid ideation and general conspiracy mentality. Paranoid ideation was measured with the Revised Green Paranoid Thoughts Scale³⁶. This 18-item questionnaire assesses ideas of reference and persecutory ideation over the past fortnight on two scales. Each item (e.g. “Certain individuals have had it in for me”) is rated on a scale from 0=“not at all” to 4=“totally”. Higher scores indicate higher levels of paranoia. Reliability in this study was α=0.94 for ideas of reference and α=0.96 for persecutory ideation. General conspiracy mentality was assessed with the Conspiracy Mentality Questionnaire³⁷ an instrument designed to efficiently assess differences in the generic tendency to engage in conspiracist ideation within and across cultures. A one-dimensional and time-stable construct has been confirmed across several language versions. It consists of five statements (e.g. “Many very important things happen in the world, which the public is never informed about”) that are rated in terms of their likeliness on scale from 0=“0% chance” to 11=“100% chance”. Reliability in this study was α=0.91.

Social adversity was screened alongside socio-demographic variables with a four item self-report questionnaire used by Jaya and colleagues²⁰. The items consisted of yes/no questions covering emotional neglect, psychological abuse, physical abuse, and sexual abuse (e.g., “were you ever approached sexually against your will?”).

Generalized beliefs about self, others, and one’s own social rank were assessed with the Brief Core Schema Scales (BCSS)³⁸ and the Social Comparison Scale (SCS)³⁹. The BCSS assesses negative and positive beliefs about oneself and others on four subscales of six items, respectively (e.g., “Other people are bad”) that are rated as yes versus no. For each yes-response the degree of conviction is assessed on a scale from 1=“no, do not believe it” to 5=“yes, believe it totally”. Reliability for the subscales in the current study ranged from α=0.85 to α=0.90. The SCS consists of 11 bipolar items that ranged from 0 to 10 (e.g., inferior-superior, left out-accepted) that are rated over the past four weeks. Lower scores indicate a more negative view of the self in comparison with others. Reliability in this study was α=0.95.

Analyses

Statistical analyses were conducted with SPSS 22⁴⁰. First, we calculated Pearson correlations for all predictor variables. Next, we calculated multifactorial regression models for each of the variable clusters (1) extended socio-demographic data, (2) risk perception, (3) political view/news source, (4) specific mistrust, (5) general mistrust (5) interpersonal trauma, and (6) beliefs about the self, others, and social rank in order to compare the explained variance in vaccination willingness for these different predictor types. In a final regression model, all variables were entered to evaluate the overall explained variance. All significance tests for correlations and predictors in regression models were two-tailed tests based on available data without any further adjustments.

Next, to further test for optimization of prediction accuracy, we established a machine-learning algorithm to predict vaccination willingness (i.e. “definitely” or “probably” getting vaccinated) versus unwillingness (“definitely not” or “probably not” getting vaccinated) using all assessed variables (n=2116) and including missing values in the classification. The mid-category of (“possible”) willingness was left out in this analysis. Calculation of machine learning models were carried out in Python 3.8.6 with the packages scikit-learn 0.23.2⁴¹, as well as Numpy, Pandas and imblearn. For all tested models we used random forest classifiers and conducted hyperparameter tuning on a class-balanced version of the dataset first (see Supplement 2 for details). Next, we chose the hyperparameter configuration that had the best testing accuracy and evaluated model performance by cross-validating across the five sites and by leave-one-person-out cross-validation¹⁹. Finally, we used the calculated machine learning model to evaluate the predictive value of the individual variables. We used permutation feature importance⁴² (see Supplement 3 for details) to estimate the importance of each variable in a given model. This allowed for the selection of the highest ranking variables to test whether subsequent smaller machine learning models that use only a small selection of questionnaires retain accuracy. Furthermore, it allowed for the elimination of the highest ranking variables/variable cluster to further explore their absolute relevance (i.e., whether they could be compensated for by other predictors).

Contributors

TML, BAG, SHS, JK, and LE conceived the study. TML reviewed the literature. TML, BAG, SHS, JK, and LE administered the study and obtained the data. BS and FS analyzed the data. All authors interpreted the results. TML drafted the manuscript and all other authors reviewed and edited the manuscript. TML, BS and FS accessed and verified the underlying data.

Data availability

The study protocol, statistical analysis plan, and the machine learning code for implementation of the models will be made available on OSF immediately following publication. The datasets including individual participant data generated during and/or analysed during the current study are not publicly available due to constraints in the ethical approvals in some of the sites but are available from the corresponding author on reasonable request and after receiving approval from the respective ethic boards. Aggregated data for meta-analyses is available from the second author.

Role of funding source

There was no funding source for this study.

Additional Information

All authors declare to have no competing interests.

Sallam, M. COVID-19 Vaccine Hesitancy Worldwide: A Concise Systematic Review of Vaccine Acceptance Rates. Vaccines, 9, 160 (2021).
Bono, S. A. et al. Factors Affecting COVID-19 Vaccine Acceptance: An International Survey among Low- and Middle-Income Countries. Vaccines, 9, 515 (2021).
Lazarus, J. V. et al. A global survey of potential acceptance of a COVID-19 vaccine.Nature Medicine2020;:1–4.
Neumann-Böhme, S. et al. Once we have it, will we use it? A European survey on willingness to be vaccinated against COVID-19. Eur J Health Econ, 21, 977–982 (2020).
Wong, M. C. S. et al. Acceptance of the COVID-19 vaccine based on the health belief model: A population-based survey in Hong Kong. Vaccine, https://doi.org/10.1016/j.vaccine.2020.12.083 (2021). published online Jan 6
Anderson, R. M., Vegvari, C., Truscott, J. & Collyer, B. S. Challenges in creating herd immunity to SARS-CoV-2 infection by mass vaccination. The Lancet, 396, 1614–1616 (2020).
Britton, T., Ball, F. & Trapman, P. A mathematical model reveals the influence of population heterogeneity on herd immunity to SARS-CoV-2., 369, 846–849 (2020).
Hodgson, D., Flasche, S., Jit, M., Kucharski, A. J. & Group CC-19 W. The potential for vaccination-induced herd immunity against the SARS-CoV-2 B.1.1.7 variant. Eurosurveillance, 26, 2100428 (2021).
Allington, D., Duffy, B., Wessely, S., Dhavan, N. & Rubin, J. Health-protective behaviour, social media usage and conspiracy belief during the COVID-19 public health emergency. Psychological Medicine undefined/ed;: 1–7.
Ebrahimi, O. V. et al. Risk, Trust, and Flawed Assumptions: Vaccine Hesitancy During the COVID-19 Pandemic. Front Public Health, 0, https://doi.org/10.3389/fpubh.2021.700213 (2021).
Freeman, D. et al. COVID-19 vaccine hesitancy in the UK: the Oxford coronavirus explanations, attitudes, and narratives survey (Oceans) II.Psychological Medicine2021;:1–15.
Pogue, K. et al. Influences on Attitudes Regarding Potential COVID-19 Vaccination in the United States. Vaccines, 8, 582 (2020).
Rhodes, A., Hoq, M., Measey, M-A. & Danchin, M. Intention to vaccinate against COVID-19 in Australia. Lancet. Infect. Dis, 0, https://doi.org/10.1016/S1473-3099(20)30724-6 (2020).
Malik, A., McFadden, S., Elharake, J. & Omer, S. B. Determinants of COVID-19 vaccine acceptance in the US. EClinicalMedicine, 26, 100495–100495 (2020).
Peretti-Watel, P. et al. A future vaccination campaign against COVID-19 at risk of vaccine hesitancy and politicisation. Lancet. Infect. Dis, 20, 769–770 (2020).
Murphy, J. et al. Psychological characteristics associated with COVID-19 vaccine hesitancy and resistance in Ireland and the United Kingdom. Nature Communications, 12, 29 (2021).
Brewer, N. T. et al. Meta-analysis of the relationship between risk perception and health behavior: the example of vaccination. Health Psychol, 26, 136–145 (2007).
Schmid, P., Rauber, D., Betsch, C., Lidolt, G. & Denker, M-L. Barriers of Influenza Vaccination Intention and Behavior - A Systematic Review of Influenza Vaccine Hesitancy, 2005–2016. PLoS One, 12, e0170550 (2017).
Yarkoni, T. & Westfall, J. Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspect Psychol Sci, 12, 1100–1122 (2017).
Jaya, E. S., Ascone, L. & Lincoln, T. M. Social Adversity and Psychosis: The Mediating Role of Cognitive Vulnerability. Schizophr Bull, 43, 557–565 (2017).
Wise, J. Covid-19: How AstraZeneca lost the vaccine PR war. BMJ, 373, n921 (2021).
Bloomfield, P. S., Magnusson, J., Walsh, M. & Naylor, A. Communicating public health during COVID-19, implications for vaccine rollout. Big Data & Society, 8, 20539517211023536 (2021).
Dyer, O. Covid-19: Vaccine doses expire in US as uptake falls by 68%. BMJ, 373, n1536 (2021).
Delhey, J. & Newton, K. Who trusts?: The origins of social trust in seven societies. European Societies, 5, 93–137 (2003).
Daenekindt, S., van der Waal, J. & de Koster, W. Social mobility and political distrust: cults of gratitude and resentment? Acta Polit, 53, 269–282 (2018).
MISHLER, W. ROSE R. What Are the Origins of Political Trust?: Testing Institutional and Cultural Theories in Post-communist Societies. Comp. Polit. Stud, 34, 30–62 (2001).
Torcal, M. The Decline of Political Trust in Spain and Portugal: Economic Performance or Political Responsiveness? Am. Behav. Sci, 58, 1542–1567 (2014).
Knobloch-Westerwick, S., Mothes, C. & Polavin, N. Confirmation Bias, Ingroup Bias, and Negativity Bias in Selective Exposure to Political Information. Communication Research, 47, 104–124 (2020).
Commissioners of the Lancet Commission on Vaccine Refusal. Acceptance, and Demand in the USA. Announcing the Lancet Commission on Vaccine Refusal, Acceptance, and Demand in the USA., 397, 1165–1167 (2021).
Cardenas, N. C. ‘Europe and United States vaccine hesitancy’: leveraging strategic policy for ‘Infodemic’ on COVID-19 vaccines. Journal of Public Health 2021; published online June 17. DOI:10.1093/pubmed/fdab228.
Chou, W-Y-S. & Budenz, A. Considering Emotion in COVID-19 Vaccine Communication: Addressing Vaccine Hesitancy and Fostering Vaccine Confidence. Health Commun, 35, 1718–1722 (2020).
Kingston, J. et al. The Pandemic-Paranoia Scale (PPS): Factor Structure and Measurement Invariance across Languages. under review.
Wong, L. P., Alias, H., Wong, P-F., Lee, H. Y. & AbuBakar, S. The use of the health belief model to assess predictors of intent to receive the COVID-19 vaccine and willingness to pay. Human Vaccines & Immunotherapeutics, 16, 2204–2214 (2020).
Shevlin, M. et al. COVID-19-related anxiety predicts somatic symptoms in the UK population. British Journal of Health Psychology, 25, 875–882 (2020).
Shapiro, G. K., Holding, A., Perez, S., Amsel, R. & Rosberger, Z. Validation of the vaccine conspiracy beliefs scale. Papillomavirus Res, 2, 167–172 (2016).
Freeman, D. et al. The revised Green Paranoid Thoughts Scale (R-GPTS): psychometric properties, severity ranges, and clinical cut-offs. Psychological Medicine 2019;: 1–10.
Bruder, M., Haffke, P., Neave, N., Nouripanah, N. & Imhoff, R. Measuring Individual Differences in Generic Beliefs in Conspiracy Theories Across Cultures: Conspiracy Mentality Questionnaire. Front Psychol, 4, https://doi.org/10.3389/fpsyg.2013.00225 (2013).
Fowler, D. et al. The Brief Core Schema Scales (BCSS): Psychometric properties and associations with paranoia and grandiosity in non-clinical and psychosis samples. Psychol. Med, 36, 749–759 (2006).
Allan, S. & Gilbert, P. A social comparison scale: Psychometric properties and relationship to psychopathology. Personality and Individual Differences, 19, 293–299 (1995).
IBM Corp. SPSS Statistics for Windows, Version 22.0. Armonk (IBM Corp, NY, 2013).
Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830 (2011).
Breiman, L. Random forests. Mach. Learn, 45, 5–32 (2001).

No competing interests reported.

Supplement.pdf

Download PDF

Editorial decision: Major revision
09 Nov, 2021
Reviews received at journal
29 Oct, 2021
Reviewers agreed at journal
16 Oct, 2021
Reviewers invited by journal
06 Oct, 2021
Editor assigned by journal
06 Oct, 2021
Editor invited by journal
19 Aug, 2021
Submission checks completed at journal
17 Aug, 2021
First submitted to journal
16 Aug, 2021

You are reading this latest preprint version

A machine learning approach to identify psychological factors driving vaccine hesitancy in high income countries

Status:

Version 2

Abstract

Introduction

Results

Discussion

Methods

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 2