Improving the accuracy of self-reported height and weight in surveys: An experimental study

Background

Obesity is an important risk factors for a wide range of chronic diseases (Ng et al. 2013; Wang et al. 2011; Guh et al. 2009). Despite research demonstrating the limitations of the use of body mass index (BMI) as a measure of body fatness (Shah & Braverman 2012; Rothman 2008; Prentice & Jebb 2001; Burkhauser & Cawley 2008), BMI continues to be used for clinical diagnoses (Cameron et al. 2017; Neermark et al. 2019; Amster et al. 2020) and to estimate population rates of overweight and obesity (Aune et al. 2016; Cole et al. 2007; Hall & Cole 2006), with higher BMI associated with increased risk of obesity-related comorbidities and increased morbidity and mortality (Weir & Jan 2019; Després, 2012; Neelund, et al. 2015). BMI is calculated by dividing a person’s weight in kilograms by their height in metres squared. A BMI of less than 18.5 is considered underweight, between 25 and 30 is categorized as overweight, and over 30 is considered obese (WHO, 2021).

Ideally, height and weight are measured by a clinician, using calibrated instruments such as a stadiometer for height and weighing scales for weight (Biehl et al. 2013). However, self-report measures are often used in large population health studies due to limitations in funding and resources (Turrell et al. 2021; Watson & Wooden 2012). Research comparing self-reported height and weight data with clinical data generally finds discrepancies between the two sets of measurements, with certain groups of people over-reporting height and/or under-reporting weight (Flegal, et al. 2019; Danubio et al. 2008). The result can be underestimation of BMI (Maukonen, et al. 2018; Gosse 2014) and misclassification of individuals as “underweight”, “normal weight”, “overweight” and “obese” (Hodge et al. 2020), leading to lower estimates of obesity prevalence as well as greater random error (Flegal et al. 2019). Formulas designed to correct for this error have been only partly successful (Ayre, 2012; Gosse 2014; Neermark et al. 2019).

Given this reliance on self-report to calculate BMI, it is important to explore ways to gather more accurate data using this approach. One possibility largely ignored in the public health literature is to improve the way the questions about height and weight are asked in surveys. It has long been known in the survey research literature that how questions are asked can have a significant impact on responses (Kalton & Schuman, 1982; Schwarz, 1999). It therefore seems plausible that the accuracy of self-reported height and weight could be improved by asking the questions differently.

The primary aim of this study was to determine, using an experimental design, whether the accuracy of self-reported height and weight data can be increased by improving how these questions are asked in surveys. The findings will contribute to the evidence base on understanding self-reporting bias, and help integrate the literatures that currently exists somewhat separately in the psychological and survey research disciplines.

Accuracy of self-reported BMI

Studies comparing measured and self-reported BMI find that, although the correlations between the two measures are generally high (Davies et al 2020; Hodge et al 2020; Ng 2019), there is a bias towards overreporting of height and underreporting of weight, resulting in an underreporting of BMI (Maukonen, et al. 2018; Flegal et al. 2019) and subsequent misclassification of BMI categories among participants. This systematic error results in misclassification bias, of which there are two types: differential and non-differential. Differential misclassification is related to other study variables whereas non-differential misclassification is not (Rothman 2012:133). Non-differential misclassification is less likely to bias estimates, and tends to produce estimates that are “diluted” or closer to the null. This means that if there is no effect to begin with, non-differential misclassification is unlikely to bias the effect estimate (Rothman 2012:134). Biases from differential misclassification are less predictable, and can either exacerbate or underestimate an effect (Rothman 2012:134). The issue of misclassification bias is particularly pertinent for studies measuring self-reported height and weight: studies in which subgroups have an equal chance of misclassification of BMI categories have more predictable bias, and are less likely to be biased overall.

Existing research suggests that individuals with higher BMIs tend to underreport weight (Hodge et al. 2020; Maukonen et al 2018; Wen & Kowaleski-Jones 2012; Tang et al. 2016), whereas older people tend to overestimate height (Amster et al 2020; Taylor et al. 2006). Thus, misclassification appears to be differential rather than non-differential (Pham et al. 2019). Conclusions regarding the impact of this bias range from slight to significant (Spencer et al. 2002; Glaesmer & Brahler 2002; Engstrom et al. 2003; Kuczmarski et al. 2001; Dhaliwal et al. 2010; Ng 2019; Flegal et al. 2018). Nevertheless, all agree that more accurate data is preferable.

Explanations for this misreporting

To improve the accuracy of self-reported height and weight data, it is necessary to understand why these data are misreported. Whereas the psychological literature has mostly focused on the reporting of traits and attitudes, and the survey literature has emphasised the reporting of behaviours, it appears that similar processes lead to both types of misrepresentation (Tourangeau & Yan 2007).

The most commonly proffered explanation from both the psychology and survey methodology literature is social desirability (Leary 2019). This theory argues that people have a strong desire for others to see them in a positive light. In cultures that favour lower weight and greater height, people may report being taller and weighing less than their actual measurements to promote a more positive picture of themselves to others, such as a survey interviewer (Larson 2000). A recent study supporting this theory found that women’s social desirability score was significantly correlated with the discrepancy between self-reported and measured body weights after adjusting for their actual weight (Luo at al. 2019).

This distorted self-presentation may constitute either a “deliberately deceptive act” (i.e. impression management) or simply a “self-serving estimation error” (i.e. self-deception) (DeAndrea et al. 2012; Uziel 2010; Braun 2001). DeAndrea et al. (2012) argue that one may distinguish between the two possibilities by establishing whether there is the presence of “ground truth” – i.e., knowledge of one’s true height and weight. In other words, if someone knows their actual height and weight, any reported distortion of these data is deliberate, whereas if they are unsure of their actual height and weight, or at least have convinced themselves that they are unsure of their actual height and weight, they may simply report data favourably. This theory suggests that, if one could either determine or enhance ‘ground truth’, accurate reporting of height and weight would be enhanced.

If the theory of social desirability is correct as applied to the self-reporting of height and weight, and people misreport their height and weight in order to influence an interviewer to think better of them, then one solution to this data bias problem would be to remove the influence of interviewers and instead conduct the survey using an anonymous mode, such as online or mail, rather than over the telephone or face-to-face. A considerable body of research, however, finds that in many cases more socially desirable responses are provided to survey questions even when there is no one asking the questions, thus casting doubt on this theory as the sole explanation for the misreporting of height and weight (Gnambs & Kaspar 2017). Krueuter et al (2008), for example, found no differences in responses between interviewer- and self-administered modes for a set of five normative behaviours, including receiving academic honours and donating money to the university. Research by the Pew Research Center (Keeter, et al. 2015) found little difference in the reported frequency of church attendance from participants assigned randomly to a telephone interview or a web survey.

Another possible explanation for bias in self-reported height and weight is based on Identity theory, which concerns what people value and how people view themselves (Stryker and Burke 2000). Rather than providing survey responses to convince the interviewer that they are a worthy person, survey participants may instead be expressing their self-identity as a worthy (i.e. a slightly taller and lighter) person. The participant sees themselves, or wants to see themselves, as healthy, active and attractive, and thus responds to the height and weight questions in a way that more closely accords with this self-view. If someone values being fit and attractive, and views themself as being fit and attractive, they may underreport their true weight and/or overreport their true height as a low-cost opportunity to enact their identity (Brenner & DeLamater 2016). Brenner & DeLamater (2016:337) posit that, rather than being motivated solely by concerns regarding self-presentation, “the respondent pragmatically reinterprets the question to be one about identity rather than behavior, a process influenced by a desire for consistency between the ideal self and the actual self. This pragmatic interpretation of the survey question encourages the respondent to answer in a way that affirms strongly valued identities.” Identity theory, unlike social desirability theory, does not predict that responses to socially desirable questions will be more biased with non-anonymous survey modes (i.e. when another person is asking the questions), but instead predicts greater bias when self-identity does not accord closely with reality. Thus, conventional direct survey questions can prompt the participant to reflect not only on the actual self, but also on their ideal self (Higgins 1987).

Impact of question wording on responses to sensitive questions

It is clear from the survey research literature that how survey questions are asked can impact on responses. This is particularly true for “sensitive” questions, such as illicit drug use, abortion, and sexual behavior, and “intrusive” questions such as household income, although what is considered sensitive or intrusive likely differs by demographic group, cultural background (Johnson & Van de Vijver, 2003) and individual (Tourangeau & Yan 2007).

There is evidence that specifically asking participants to provide accurate information, sometimes referred to as a priming procedure, improves accuracy of sensitive or intrusive survey questions (Tournangeau & Yan 2007). Another promising approach to improving the accuracy of self-reported height and weight is by providing additional assurances regarding the confidentiality of the data, which has been shown to reduce misreporting (Singer et al, 1995 ). Although most surveys provide such assurances at the start of the survey, or as part of the informed consent process, additional reassurance prior to asking the height and weight questions may improve reporting.

Finally, framing effects may be important (de Bruin et al. 2011). Framing refers to the process by which people perceive and conceptualise an issue. Framing effects occur when changes in the presentation of an issue produce changes of opinion (Chong & Druckman 2007). Two subtypes of framing effects are wording and context effects. Context effects refer to influence on survey responses by the context in which a question is asked. Wording effects refers to the language used to ask the question. These effects have been observed on an array of issues (e.g., Petrinovich & O'Neill 1996; Druckman 2001; Feldman & Hart 2018). Although normally discussed in relation to attitudes, framing effects may also be important for other types of survey responses, such as self-reporting of height and weight. Little research, however, has examined its impact on these types of questions.

Magelssen et al (2016), for example, examined the impact of context and wording on support for assisted dying in Norway, by randomly assigning participants to different versions of the same questions. In one version, participants were simply asked whether they agreed or disagreed that physician-assisted suicide should be allowed for persons who have a terminal illness with short life expectancy. The second version added additional information that included an example of a particular patient who ‘is in great pain’, careful consideration by a doctor, and the choice of the patient to ‘avoid great suffering.’ Whereas the first version asks about ‘physician-assisted suicide’ and ‘euthanasia’, the second version uses the phrase, ‘a lethal drug dose that the patient can choose to take to avoid great suffering’. The result is significantly greater support for assisted dying by participants assigned to version 2. Another example of wording effects in the area of economic attitudes finds that expectations and perceptions regarding future inflation rates were lower and less variable when participants were asked about “inflation” as opposed to “prices in general” or “prices you pay” (de Bruin et al. 2012). These effects of context and wording, however, do not always hold. Singer and Couper (2014), for example, conducted an experiment in which they randomly assigned participants to questions about attitudes toward prenatal testing and abortion framed either in terms of “baby” or “fetus”, with the expectation that support would be higher for those assigned to the second version. They found, however, no significant differences by question wording for abortion preferences and small but significant differences for prenatal testing. They did, however, find that question wording made substantial differences in the responses of some demographic subgroups. It may be that attitudes towards abortion, in particular, are so strongly held by many that framing effects have little impact.

Finally, the presence of an authoritative citation, where the question is asked with the addition of an authoritative statement supporting it, has also been shown to affect survey responses – again, mostly on attitude questions (Cocco & Tuzzi 2013). Cocco & Tuzzi (2013), in an Italian study examining the impact of question-wording and context on attitudes towards homosexual behaviour and a possible law against homophobia, found more negative responses with the addition of the following statement: “Silvio Berlusconi has stated that it is better to appreciate beautiful girls than to be gay.” One may argue about the “authoritativeness” of such a statement; nevertheless, the point holds that the statement is attached to a person of authority.

The aim of this study was to determine, using an experimental design, whether the accuracy of self-reported height and weight data can be increased by improving how these questions are asked in surveys, drawing on the relevant evidence from the psychology and survey research literatures. Four hypotheses are tested.

Methods

Two surveys from two separate studies were used to test our hypotheses. Ethics approval for Study 1 (“Science Survey”) was provided by the Australian National University Human Research Ethics Committee. Ethics approval for Study 2 (“Eating Behaviours (EB) Survey”) was provided by the Charles Sturt University Human Research Ethics Committee. All methods were performed in accordance with guidelines and regulations set out by the above institutions.

Participants and procedure

The Science Survey consisted of a Random Digit Dialling (RDD) Computer Assisted Telephone Interview (CATI) survey of 1200 Australian adults (aged 18+) across Australia. The EB Survey consisted of an RDD CATI survey of 200 non-metropolitan Australian adults (aged 18+). The participation rate (AAPOR 2016) for the Science survey was 43.2%. Of the 5,637 telephone numbers dialled, 1,065 were unusable (e.g. disconnected; not a residential number), for 1,371 there was no contact (e.g. no answer; answering machine; engaged), and 426 were deemed out of scope (e.g. non-English speaking; no one age 18 + in household). Of the 2776 telephone numbers considered in scope, 1200 interviews were completed. The participation rate for the EB survey was 34.7%. Of the 2,867 telephone numbers dialled, 1,524 were unusable, for 522 there was no contact, and 79 were deemed out of scope. Of the 742 telephone numbers considered in scope, 200 interviews were completed.

Informed consent

was obtained from each participant before starting the survey. No incentive was provided for participation. Both surveys were conducted by the Social Research Centre, a social research company.

Key demographics of the two samples are presented in Table 1. Given that the population of the EB Survey was non-metropolitan Australians whereas the population of the Science Survey was all Australians, it is not surprising that Science Survey participants were somewhat younger, more highly educated, had higher incomes, and lower BMIs (AIHW 2017) as compared with the EB Survey participants; the Science Survey sample also had a more even mix of men and women.

Table 1

Survey sample characteristics
	Science Survey (n = 1200)		EB Survey (n = 200)
Variable	n	%	n	%
Age	1170	97.5	200	100.0
18–34 years	200	17.1	35	17.5
35–54 years	467	39.9	92	46.0
55 + years	503	43.0	73	36.5
Mean	51.03		43.58
Standard deviation	16.30		17.88
Gender	1200	100.0	200	100.0
Female	609	50.8	114	57
Male	591	49.2	86	43
Location	1200	100.0	200	100.0
Metro	n/a	n/a	0	0
Non-metro	n/a	n/a	200	100
Australian Capital Territory	100	8.3	0	0
New South Wales	234	19.5	53	26.5
Northern Territory	100	8.3	11	5.5
Queensland	212	17.7	49	24.5
South Australia	127	10.6	24	12.0
Tasmania	100	8.3	16	8.0
Victoria	191	15.9	27	13.5
Western Australia	136	11.3	20	10.0
Highest education	1185	98.8	193	87.7
Completed less than Year 12	282	23.8	63	32.6
Completed Year 12	207	17.5	34	17.6
Completed TAFE or other certificate	308	26.0	47	24.5
Completed university or higher	388	32.7	51	26.6
Household income	942	78.5	186	93.0
< $80,000	537	57.0	116	62.4
>=$80,000	405	43.0	70	37.6
BMI	1080	90.0	192	96.0
Underweight (< 18.5)	24	2.3	3	1.6
Normal (18.5–25)	482	40.2	65	33.9
Overweight (25–30)	377	31.4	78	40.6
Obese (> 30)	197	16.4	46	24.0
Mean	26.21		27.22
Standard deviation	5.04		5.14

Measures

The focus of the Science Survey was on public attitudes towards science. The questions relevant to this study comprised a module in the second part of the survey. The height and weight questions were asked after the science attitude questions and a general health question, but before the demographic questions. The focus of the EB Survey was on eating behaviours and physical and mental health. The height and weight questions were asked after a series of questions about eating behaviours, attentiveness to messages about dieting, health conditions, and smoking behaviour.

In both surveys, participants were randomly assigned to one of three versions of a preamble to the height and weight questions. They were then independently randomly assigned to one of two versions of the weight question. Therefore, in each survey, each participant was asked one of six combinations of height and weight questions (see Table 2). The height question was always asked before the weight question.

Table 2

Height and weight questions
Version	Height question			Weight question		n
	Science Survey
	Science_a. (No preamble)	Science_b. I’m now going to ask you your height and weight. It’s very important that we get as accurate data on these questions as possible.	Science_c. I’m now going to ask you your height and weight. Research shows that people tend to overestimate their height and underestimate their weight. It’s very important that we get as accurate data on these questions as possible.	Science_Wa. How much do you weigh? (ONLY IF QUERIED: ‘Without clothes or shoes’)	Science_Wb. How much do you weigh without clothes or shoes?
	How tall are you without shoes?	Can you tell me, how tall are you without shoes?	Can you tell me, how tall are you without shoes?
Sci-1	x			x		172
Sci-2	x				x	190
Sci-3		x		x		224
Sci-4		x			x	217
Sci-5			x	x		200
Sci-6			x		x	171
	EB Survey
	EB_1. (No preamble)	EB_2. I’m now going to ask you your height and weight. Please respond honestly. Our data rely on honest answers. Remember your name is not associated with your responses, so no one will know your height and weight.	EB_3. I’m now going to ask you your height and weight. Research shows that people tend to OVERestimate their height and UNDERestimate their weight. It’s very important that we get as accurate data on these questions as possible.	EB_W1. How much do you weigh? (ONLY IF QUERIED: ‘Without clothes or shoes’)	EB_W2. How much do you weigh without clothes or shoes?
	How tall are you without shoes?	Can you tell me, how tall are you without shoes?	Can you tell me, how tall are you without shoes?
EB-1	x			x		37
EB-2	x				x	38
EB-3		x		x		34
EB-4		x			x	26
EB-5			x	x		29
EB-6			x		x	36

In the Science Survey, participants were randomly assigned to one of the following ‘preamble conditions’: (a) “no preamble” condition, in which they were simply asked how tall they are without shoes; (b) “accountability/priming” condition, in which the interviewer first stressed the importance of gathering accurate data on height and weight before asking the height question; or (c) “authoritarian citation/accountability/priming” condition, in which participants were also told that research shows that people tend to overestimate height and underestimate weight. In the EB Survey, participants were randomly assigned to one of the following “preamble “ (a) “no preamble” condition, identical to the Science Survey; (b) “context/priming/confidentiality” condition, in which the interviewer not only stressed the importance of accuracy in gathering height and weight data, but also reiterated participant anonymity; or (c) “context/priming/authoritarian citation” condition, which was similar to the Science Survey condition.

Following the height question, participants were again randomly assigned – this time to one of two weight questions. The “standard” version simply asked the participant to report their weight. Only if queried were they told this meant their weight without clothes or shoes. The “specific” version specified weight without clothes or shoes. Table 2 indicates the 12 (six for each study) combinations of height and weight questions posed to participants along with the number of participants in each condition.

[Table 2 here]

Hypotheses

Using an experimental design, this study tested whether the addition of preambles to the height and weight questions would result in more accurate self-reported height and weight responses. As we did not have data on participants’ actual height and weight, we used the comparative approach (Krumpal 2013), whereby lower height and higher weight are assumed to be more accurate. In addition, we tested hypotheses supported by the literature regarding the differential impact of improved question-wording on specific sub-population groups.

The following hypotheses were tested:

Hypothesis 1

Participants read a preamble (either Sci-3-4 (accountability + priming) or Sci-5-6 (accountability + priming + authoritarian citation; or EB-3-4 (context + priming + confidentiality) or EB-5-6 (context + priming + authoritarian citation)) prior to being asked the height and weight questions will report lower height and higher weight, on average, than those who were not read a preamble (Sci-1-2; EB-1-2).

Hypothesis 2

The association between question-wording (i.e. preamble vs. no preamble) and self-reported weight will be greater for individuals with higher BMIs.

Hypothesis 3

The association between question-wording and self-reported height will be greater for older participants.

Hypothesis 4

We also test, but have no hypothesis regarding, whether the “standard” or “specific” weight questions (i.e. Sci-3-6 vs. Sci-1-2 and EB-3-6 vs. EB-1-2) will result in more accurate (i.e. higher) self-reporting of weight. On the one hand, we would predict that the “specific” version – “without shoes or clothes” -- should result in more accurate weights because people weigh slightly less without clothes and shoes. On the other hand, the additional wording in the “specific” version may prime participants to report more accurate responses (DeAndrea et al. 2012), which should result in higher self-reported weights.

Statistical analyses

Sample sizes were based largely on practical considerations. For the Science survey, the relevant questions were added to an existing planned survey that required a sample size of 1200. The sample size (n = 200) for the EB survey was determined by budget. The variables, height, weight, BMI, and age all satisfy standard tests for normality and other statistical assumptions (Tabachnick & Fidell 2013). All observed height and weight outliers were included in the analysis as the reported values were in a plausible range. (GroßschaDl, et al. 2012).

The two surveys were analysed separately as their populations were different (i.e., all Australians 18+; non-metropolitan Australians 18+). The following tests were used to examine demographic differences between the six different question-version groups: chi-square test for independence (Gender); Krukal-Wallis test (Education; Household income); one-way analysis of variance (ANOVA) (Age). Multivariate analysis of variance (MANOVA) was conducted to test Hypothesis 1. Moderation analysis using the Hayes PROCESS macro was used to test Hypothesis 3. As BMI is calculated using weight (the dependent variable), moderation analysis was not appropriate for testing Hypothesis 2. Instead, two-way ANOVAs were conducted with question-wording and BMI categories (“normal weight”, “overweight”, “obese”) as factors, in order to examine the interaction between question-wording and BMI. As few participants were classified as “underweight” based on BMI (Science Survey: n = 24, 2.2%; EB Survey: n = 5, 2.4%), they were not included in this analysis. Independent samples t-test was conducted to test Hypothesis 4. All analyses were conducted using IBM SPSS Statistics for Windows, version 26 (IBM Corp., Armonk, N.Y., USA).

Results

There were no differences in either survey among the six question-version groups on the following demographics: gender (Science survey: Chi-square (5, n = 1180) = 3.19, p = .67, Cramer’s V = .05; EB Survey: Chi-square (5, n=200) = 9.33, p = .10, Cramer’s V = .22), education (Science survey: Chi-square (5, n=1165) = 6.06, p = .300; EB Survey: Chi-square (5, n=198) = 7.22, p = .205), or household income (Science survey: Chi-square (5, n=927) = 4.30, p = .507; EB Survey: Chi-square (5, n=185) = 8.56, p = .128). For age, there were no significant differences among the six question-version groups in the Science survey (F (5, 1144) = 0.23, p = .95), but a significant difference between EB-4 (mean = 53.5) and EB-5 (mean = 37.1) in the EB survey (F (5, 193) = 2.77, p = .02).

Hypothesis 1: Survey participants read a preamble prior to being asked their height and weight will report lower heights and higher weights than those not read a preamble.

As can be seen in Tables 3 and 4, there was no significant difference between the no preamble and preamble groups on self-reported height or weight. Therefore, Hypothesis 1 was not supported. Participants in the Science Survey read either the Sci-3 or Sci-4 (accountability; priming) or Sci-5 or Sci-6 (accountability; priming; authoritarian citation) preambles did not report lower heights and higher weights as compared with participants read no preamble (Sci 1 or Sci-2). Similarly, participants in the EB Survey read the EB-3 or EB-4 (context + priming + confidentiality) or EB-5 or EB-6 (context + priming + authoritarian citation) preambles did not report lower heights and higher weights as compared with participants given no preamble (EB-1 or EB-2).

Table 3. Mean scores and standard deviations for Height and Weight as a function of question-wording (preamble vs. no preamble)

Table 4. Multivariate and univariate analyses of variance for Height and Weight

A comparison of each of the preambles separately with the no preamble condition (i.e. no preamble vs. accountability + priming vs. accountability + priming + authoritarian citation in the Science survey, and no preamble vs. context + priming + confidentiality vs. context + priming + authoritarian citation) also resulted in no statistically significant differences, although we note that in the EB survey, mean weight for the context + priming + confidentiality condition was 5.4 kgs higher than for the no preamble condition (81.6 kg (SD = 17.35) vs. 76.2 kg (SD = 18.09)), and 5.1 kgs higher than for the context + priming + authoritarian citation condition (76.5 kg (SD = 14.57); (F (2, 197) = 2.02, p = .135). Neither of the preambles in the Science Survey included a confidentiality statement.

Hypothesis 2: The impact of question-wording (i.e., preamble vs. no preamble) on self-reported weight will be greater for participants with higher BMIs.

The mean scores and standard deviations for weight as a function of BMI category are presented in Table 5. The interaction effects between question-wording and BMI category were not statistically significant (Science survey: F (2, 1057) = 0.65, p = .52. EB survey: F (2, 189) = 2.13, p = .122). Therefore, Hypothesis 2 was not supported.

Table 5. Mean scores and standard deviations for Weight as a function of BMI category

Hypothesis 3: The impact of question-wording on height will be greater for older participants.

As shown in Table 6 and Figures 1a and 1b, although slopes were in the expected directions, the interaction between question-wording (i.e. preamble vs. no preamble) and age was found to be not statistically significant. Therefore, hypothesis 3 was not supported.

Table 6. Age as a moderator between question-wording and self-reported height

Hypothesis 4: Either version of the weight question – standard or “weight-specific”—may result in participants reporting more accurate self-reported weight.

As can be seen in Table 7, there was no significant difference between the standard and specific groups on self-reported weight. Therefore, Hypothesis 4 was not supported.

Table 7. Differences in self-reported weight for standard vs. specified weight question

Discussion And Conclusions

No significant differences were observed in self-reported height or weight between survey participants who were read preambles to the height and weight questions designed to elicit more accurate responses (i.e. lower heights and higher weights), and those who were not. There was also no support for the hypotheses that the impact of question-wording on self-reported weight would be greater for those participants with higher BMIs nor for the hypothesis that the impact of question-wording on self-reported height would be greater for older participants.

In designing the preambles, we focused on those mechanisms identified in the literature as improving the accuracy of responses to questions deemed sensitive or intrusive; namely, accountability; priming; assurances of confidentiality; framing, wording, and context; and authoritative citation. One possible explanation for our results is that the wording of the preambles did not correctly operationalise the hypothesised underlying mechanisms. For example, perhaps the phrase, “Research shows that people tend to OVERestimate their height and UNDERestimate their weight” did not correctly operationalise the mechanism of authoritative citation. It is also possible that participants didn’t pay much attention to the preamble and simply heard the questions about height and weight. This is more of an issue with web surveys, however (Brosnan, et al. 2019), than when an interviewer is reading out the questions.

Another possibility is that the identified mechanisms don’t apply to questions about height or weight. Perhaps questions about height and weight are simply not sensitive or intrusive enough to be amenable to manipulation by question wording (Tourangeau, et al. 2000). An examination of the data revealed that while just 0.3% of participants in the Science survey refused to answer a “neutral” question about whether they have a scale in their house, 2.3% refused to provide their weight and 1.2% refused to provide their height. This compares with 2.5% that refused to provide their age, and 10.7% that refused to provide their household income; the latter question having notoriously high refusal rates (Yan, 2021). These data suggest that questions about height and, in particular, weight, are arguably “sensitive”. In the EB survey, however, no one refused to provide their height and just 0.5% (1 participant) refused to provide their weight. This difference in response rates to these questions between the two surveys may have something to do with within-survey context – the EB survey was introduced as a survey about health and eating behaviours, and thus questions about height and weight were likely not unexpected. The Science survey, in contrast, was introduced to participants as a survey about attitudes towards science, with most questions on this topic, and thus questions about one’s height and weight may have been viewed as unexpected and therefore sensitive or intrusive. Alternatively, the lower refusal rates in the EB survey may be due to the different populations of the two surveys – non-metropolitan residents in the EB survey as compared with mostly metropolitan residents in the Science survey. The refusal rates for age (0%) and household income (0.3%) were also very low in the EB survey, although an additional 9.8% of participants said they “didn’t know” their household income, which may indicate soft refusals (Sell et al. 2007; Hamrick 2012).

Alternatively, perhaps beliefs about one’s height and weight are so firmly fixed – whether due to faulty memory (for example, what one weighed as a young adult as opposed to now) or a strong identity attachment to being taller and thinner than one actually is (Brenner and DeLamater 2016) -- that promptings designed to trigger the identified mechanisms simply fail (Singer and Couper 2014). However, evidence that people who strongly suspected they would be weighed and measured following questions about their height and weight were less likely to bias their self-reports (Wen & Kowaleski-Jones 2012) suggests this may not be the case. Instead, accountability may be key to counteracting this distortion of self-presentation. according to impression management theory, it may do more damage to one’s impression management to be caught lying about one’s height and weight than to be seen as shorter and heavier than is societally desirable (DeAndrea, et al. 2012). The survey research literature refers to this phenomenon as the “bogus pipeline”(Tourangeau & Yan 2007). Großschadl et al. (2012), for instance, posited that an explanation for their finding of fewer discrepancies between self-reported and actual height and weight measurements for women and older people than those found in most other studies was that participants completed the survey as part of a health check, and thus likely knew that they would also have their height and weight measured. It is also possible that people who volunteer for a health check are more aware of their actual height and weight, and therefore have greater “ground truth.” Although several of the preambles attempted to trigger “accountability” by stressing the importance of gathering accurate data, this is surely a weaker prod than the “threat” of being weighed and measured. Future studies may want to consider asking participants (who own working scales and/or tape measures) to weigh and measure themselves and report this data, to see if this increases accuracy. This would also help establish ground truth (i.e. their actual measurements), and thus help determine whether biased reporting is a “deliberately deceptive act” or simply a “self-serving estimation error” (DeAndrea et al. 2012).”

Finally, it is possibly that the broader context of the survey plays a role. In the Science survey, where most of the survey questions asked about attitudes towards science, none of the group differences by preamble condition was close to statistically significant. In the EB survey, however, which focused on health and eating behaviours, most of the differences were in the expected directions and several approached statistical significance.

Limitations of this study, in addition to the small sample size of the EB survey, include that the EB survey was limited to non-metropolitan residents. It is possible that rural Australian are impacted by question-wording about height and weight differently than metropolitan residents. Another limitation is that the surveys used for these analyses were designed to serve a number of purposes, and therefore some of the questions were not ideally designed to answer the research questions posed in this study. In particular, each of the preambles combines mechanisms purported to impact responses, rather than testing each individually, such that it is impossible to disentangle the impact of each.

Nevertheless, and despite a lack of support for any of the four hypotheses, we believe that this paper makes an important contribution to the literature. From a population health perspective, it is important that self-reported height and weight be as accurate as possible and that we continue to seek ways to achieve this. Despite evidence from the survey research literature that question-wording can have a significant impact on responses, to date little research has examined whether the accuracy of self-reported height and weight data can be improved by asking these questions differently. This study aimed to do this, drawing on both the survey research evidence on question-wording and the psychological literature on self-report bias. We believe that this study makes an important contribution to the evidence regarding self-report bias, as well as discussing some promising avenues for future research on this topic.

In particular, we recommend conducting an adequately powered study focused on health that tests both single mechanisms as well as combinations of mechanisms, in order to systematically determine whether and when question-wording can improve the accuracy of self-reported height and weight. Specifically, we believe it is worth further exploring the accountability mechanism by incorporating the possibility of actual height and weight measurements. We also propose further testing of the confidentiality mechanism, which was included in only one of the EB survey preambles. Although not statistically significant, the mean self-reported weight for the context + priming + confidentiality condition was 5.4 kgs higher than for the no preamble condition, and 5.1 kgs higher than for the context + priming + authoritarian citation. Other recommendations for future research are to include measures of social desirability (Luo et al. 2019) and identity attachment, to better understand their role in the self-reporting of height and weight. Finally, qualitative research would be useful to better understand the extent to which people associate being taller and/or thinner as an ideal and how this may impact on the self-reporting of height and weight.

Abbreviations

BMI

body mass index

Study 1

Science survey

Study 2

Eating behaviours survey

eating behaviours

RDD

random digit dialling

CATI

computer assisted telephone interview

ANOVA

analysis of variance

MANOVA

multivariate analysis of variance

Declarations

Ethics approval and consent to participate

Ethics approval for Study 1 (“Science Survey”) was provided by the Australian National University Human Research Ethics Committee (2008/114). Ethics approval for Study 2 (“Eating Behaviours (EB) Survey”) was provided by the Charles Sturt University Human Research Ethics Committee (2010/144). All methods were performed in accordance with guidelines and regulations set out by the above institutions.

Survey participants provided informed consent prior to the start of the survey.

Consent for publication

Not applicable

Availability of data and materials

The “Science Survey” dataset analysed during the current study is available in the Australian Data Archive repository, https://dataverse.ada.edu.au/dataverse/ada

The “EB Survey” dataset analysed during the current study is available from the corresponding author on reasonable request.

Competing interests

The authors declare that they have no competing interests.

Funding

This study was funded in part by a Research Development Fund from Charles Sturt University. In addition, the Social Research Centre provided an in-kind contribution towards this study. Neither funder had any role in the design, analysis, or interpretation of data, or in the writing of the manuscript. The Social Research Centre collected the survey data upon which the analysis in this paper is based.

Authors' contributions

NV conceptualised the study, analysed and interpreted the data, and drafted the manuscript. ED assisted in the acquisition of data, provided input on the study design, and substantively revised the draft manuscript. JR helped analyse and interpret the data and substantively revised the draft manuscript. All authors read and approved the final manuscript.

Acknowledgements

The authors would like to acknowledge the following people for their contribution to this study:

Professor Barbara Masser and Dr Matthew Bourke for their statistical advice and assistance
Professor Ian MacAllister for agreeing to add several questions to his ANU Science Poll

References

Amster R, Reychav I, McHaney R, Zhu L, Azuri J. Credibility of self-reported health parameters in elderly population. Primary health care research & development. 2020;21.
Aune D, Sen A, Norat T, Janszky I, Romundstad P, Tonstad S, et al. Body mass index, abdominal fatness, and heart failure incidence and mortality: a systematic review and dose–response meta-analysis of prospective studies. Circulation. 2016;133(7):639–49.
Aune D, Sen A, Schlesinger S, Norat T, Janszky I, Romundstad P, et al. Body mass index, abdominal fatness, fat mass and the risk of atrial fibrillation: a systematic review and dose–response meta-analysis of prospective studies. European journal of epidemiology. 2017;32(3):181–92.
Ayre T, Wong J, Kumar A. Investigating the discrepancy between measured and self-reported BMI in the National Health Survey. ABS of Personality Assessment. 2012;63(3):506–16.
Braun HI, Jackson DN, Wiley DE. Socially desirable responding: The evolution of a construct. InThe role of constructs in psychological and educational measurement 2001 Sep 1 (pp. 61–84). Routledge.
Brenner PS, DeLamater J. Lies, damned lies, and survey self-reports? Identity as a cause of measurement bias. Social psychology quarterly. 2016;79(4):333–54.
Brosnan K, Babakhani N, Dolnicar S. “I know what you’re going to ask me” Why respondents don’t read survey questions. International Journal of Market Research. 2019;61(4):366–79.
Burkhauser RV, Cawley J. Beyond BMI: the value of more accurate measures of fatness and obesity in social science research. Journal of health economics. 2008;27(2):519–29.
Cameron N, Godino J, Nichols JF, Wing D, Hill L, Patrick K. Associations between physical activity and BMI, body fatness, and visceral adiposity in overweight or obese Latino and non-Latino adults. International journal of obesity. 2017;41(6):873–7.
Chong D, Druckman JN. Framing theory. Annu Rev Polit Sci. 2007;10:103–26.
Cocco M, Tuzzi A. New data collection modes for surveys: a comparative analysis of the influence of survey mode on question-wording effects. Quality & quantity. 2013 Oct;47(6):3135–52.
Cole TJ, Flegal KM, Nicholls D, Jackson AA. Body mass index cut offs to define thinness in children and adolescents: international survey. Bmj. 2007;335(7612):194.
Danubio ME, Miranda G, Vinciguerra MG, Vecchi E, Rufo F. Comparison of self-reported and measured height and weight: Implications for obesity research among young adults. Economics & Human Biology. 2008;6(1):181–90.
Davies A, Wellard-Cole L, Rangan A, Allman-Farinelli M. Validity of self-reported weight and height for BMI classification: A cross-sectional study among young adults. Nutrition. 2020;71:110622.
de Bruin WB, Baldassi M, Figner B, Fischhoff B, Fleishman L, Hardisty D, Johnson E, Keren G, Konnikova M, Levin I, Li Y. Framing effects in surveys: How respondents make sense of the questions we ask. Perspectives on Framing, edited by Gideon Keren. 2011:303 – 25.
de Bruin WB, Van Der Klaauw W, Topa G, Downs JS, Fischhoff B, Armantier O. The effect of question wording on consumers’ reported inflation expectations. Journal of Economic Psychology. 2012;33(4):749–57.
DeAndrea DC, Tom Tong S, Liang YJ, Levine TR, Walther JB. When do people misrepresent themselves to others? The effects of social desirability, ground truth, and accountability on deceptive self-presentations. Journal of Communication. 2012;62(3):400–17.
Després J-P. Body fat distribution and risk of cardiovascular disease: an update. Circulation. 2012;126(10):1301–13.
Dhaliwal SS, Howat P, Bejoy T, Welborn TA. Self-reported weight and height for evaluating obesity control programs. American journal of health behavior. 2010;34(4):489–99.
Druckman JN. On the limits of framing effects: Who can frame?. The journal of politics. 2001 Nov 1;63(4):1041-66.
Biehl A, Hovengen R, Meyer HE, Hjelmesæth J, Meisfjord J, Grøholt EK, Roelants M, Strand BH. Impact of instrument error on the estimated prevalence of overweight and obesity in population-based surveys. BMC Public Health. 2013 Dec;13(1):1–6.
Engstrom JL, Paterson SA, Doherty A, Trabulsi M, Speer KL. Accuracy of self-reported height and weight in women: an integrative review of the literature. Journal of midwifery & women’s health. 2003;48(5):338–45.
Feldman L, Hart PS. Climate change as a polarizing cue: Framing effects on public support for low-carbon energy policies. Global Environmental Change. 2018;51:54–66.
Flegal KM, Kit BK, Graubard BI. Bias in hazard ratios arising from misclassification according to self-reported weight and height in observational studies of body mass index and mortality. American journal of epidemiology. 2018 Jan 1;187(1):125 – 34.
Flegal KM, Ogden CL, Fryar C, Afful J, Klein R, Huang DT. Comparisons of self-reported and measured height and weight, BMI, and obesity prevalence from national surveys: 1999–2016. Obesity. 2019;27(10):1711–9.
Glaesmer H, Brähler E. Prevalence estimation of overweight and obesity based on subjective data of body-mass-index (BMI). Gesundheitswesen (Bundesverband der Arzte des Offentlichen Gesundheitsdienstes (Germany)). 2002;64(3):133–8.
Gnambs T, Kaspar K. Socially desirable responding in web-based questionnaires: A meta-analytic review of the candor hypothesis. Assessment. 2017;24(6):746–62.
Gosse MA. How accurate is self-reported BMI? Nutrition bulletin. 2014;39(1):105–14.
Großschädl F, Haditsch B, Stronegger WJ. Validity of self-reported weight and height in Austrian adults: sociodemographic determinants and consequences for the classification of BMI categories. Public health nutrition. 2012;15(1):20–7.
Guh DP, Zhang W, Bansback N, Amarsi Z, Birmingham CL, Anis AH. The incidence of co-morbidities related to obesity and overweight: a systematic review and meta-analysis. BMC public health. 2009;9(1):1–20.
Hall DM, Cole TJ. What use is the BMI? Archives of disease in childhood. 2006;91(4):283–6.
Hamrick KS. Nonresponse bias analysis of body mass index data in the eating and health module. USDA-ERS Technical Bulletin. 2012 Aug 1(1934).
Higgins ET. Self-discrepancy: a theory relating self and affect. Psychological review. 1987;94(3):319.
Hodge JM, Shah R, McCullough ML, Gapstur SM, Patel AV. Validation of self-reported height and weight in a large, nationwide cohort of US adults. PloS one. 2020;15(4):e0231229.
Johnson TP, Van de Vijver FJ. Social desirability in cross-cultural research. Cross-cultural survey methods. 2003;325:195–204.
Kalton G, Schuman H. The effect of the question on survey responses: A review. Journal of the Royal Statistical Society: Series A (General). 1982 Jan;145(1):42–57.
Keeter S, McGeeney K, Igielnik R, Mercer A, Mathiowetz N. From telephone to the web: The challenge of mode of interview effects in public opinion polls. Pew Research Center. 2015 May 13.
Kreuter F, Presser S, Tourangeau R. Social desirability bias in cati, ivr, and web surveysthe effects of mode and question sensitivity. Public opinion quarterly. 2008;72(5):847–65.
Krumpal I. Determinants of social desirability bias in sensitive surveys: a literature review. Quality & quantity. 2013 Jun;47(4):2025–47.
Kuczmarski MF, Kuczmarski RJ, Najjar M. Effects of age on validity of self-reported height, weight, and body mass index: findings from the Third National Health and Nutrition Examination Survey, 1988–1994. Journal of the American Dietetic Association. 2001;101(1):28–34.
Larson MR. Social desirability and self-reported weight and height. International journal of obesity. 2000;24(5):663–5.
Leary MR. Self-presentation: Impression management and interpersonal behavior. Routledge; 2019 Aug 22.
Luo J, Thomson CA, Hendryx M, Tinker LF, Manson JE, Li Y, et al. Accuracy of self-reported weight in the Women’s Health Initiative. Public health nutrition. 2019;22(6):1019–28.
Magelssen M, Supphellen M, Nortvedt P, Materstvedt LJ. Attitudes towards assisted dying are influenced by question wording and order: a survey experiment. BMC medical ethics. 2016;17(1):1–9.
Maukonen M, Männistö S, Tolonen H. A comparison of measured versus self-reported anthropometrics for assessing obesity in adults: a literature review. Scandinavian journal of public health. 2018;46(5):565–79.
Neeland IJ, Turer AT, Ayers CR, Berry JD, Rohatgi A, Das SR, et al. Body fat distribution and incident cardiovascular disease in obese adults. Journal of the American College of Cardiology. 2015;65(19):2150–1.
Neermark S, Holst C, Bisgaard T, Bay-Nielsen M, Becker U, Tolstrup JS. Validation and calibration of self-reported height and weight in the Danish Health Examination Survey. European journal of public health. 2019;29(2):291–6.
Ng CD. Biases in self-reported height and weight measurements and their effects on modeling health outcomes. SSM-population health. 2019;7:100405.
Ng M, Fleming T, Robinson M, Thomson B, Graetz N, Margono C, et al. Global, regional, and national prevalence of overweight and obesity in children and adults during 1980–2013: a systematic analysis for the Global Burden of Disease Study 2013. The lancet. 2014;384(9945):766–81.
Petrinovich L, O’Neill P. Influence of wording and framing effects on moral intuitions. Ethology and Sociobiology. 1996;17(3):145–71.
Pham A, Cummings M, Lindeman C, Drummond N, Williamson T. Recognizing misclassification bias in research and medical practice. Family practice. 2019;36(6):804–7.
Prentice AM, Jebb SA. Beyond body mass index. Obesity reviews. 2001;2(3):141–7.
Rothman KJ. BMI-related errors in the measurement of obesity. International journal of obesity. 2008;32(3):S56–9.
Rothman KJ. Epidemiology: an introduction. Oxford university press; 2012 Jun 21.
Schwarz N. Self-reports: How the questions shape the answers. American psychologist. 1999;54(2):93.
Sell RL, Kates J, Brodie M. Use of a telephone screener to identify a probability sample of gays, lesbians, and bisexuals. Journal of homosexuality. 2007 Sep 1;53(4):163 – 71.
Shah NR, Braverman ER. Measuring adiposity in patients: the utility of body mass index (BMI), percent body fat, and leptin. PloS one. 2012 Apr 2;7(4):e33308.
Singer E, Von Thurn DR, Miller ER. Confidentiality assurances and response: A quantitative review of the experimental literature. Public Opinion Quarterly. 1995 Jan 1;59(1):66–77.
Singer E, Couper MP. The effect of question wording on attitudes toward prenatal testing and abortion. Public Opinion Quarterly. 2014 Jan 1;78(3):751 – 60.
Spencer EA, Appleby PN, Davey GK, Key TJ. Validity of self-reported height and weight in 4808 EPIC–Oxford participants. Public health nutrition. 2002 Aug;5(4):561–5.
Stryker S, Burke PJ. The past, present, and future of an identity theory. Social psychology quarterly. 2000 Dec 1:284–97.
Tabachnick BG, Fidell LS. Using multivariate statistics, 6th edn Boston. Ma: Pearson. 2013..
Tang W, Aggarwal A, Moudon AV, Drewnowski A. Self-reported and measured weights and heights among adults in Seattle and King County. BMC obesity. 2016 Dec;3(1):1–8.
Taylor AW, Grande ED, Gill TK, Chittleborough CR, Wilson DH, Adams RJ, Grant JF, Phillips P, Appleton S, Ruffin RE. How valid are self-reported height and weight? A comparison between CATI self‐report and clinic measurements using a large cohort study. Australian and New Zealand journal of public health. 2006 Jun;30(3):238–46.
Tourangeau R, Rips LJ, Rasinski K. The psychology of survey response.
Tourangeau R, Yan T. Sensitive questions in surveys. Psychological bulletin. 2007 Sep;133(5):859.
Turrell G, Nathan A, Burton NW, Brown WJ, McElwee P, Barnett AG, Pachana NA, Oldenburg B, Rachele JN, Giskes K, Giles-Corti B. Cohort profile: HABITAT—a longitudinal multilevel study of physical activity, sedentary behaviour and health and functioning in mid-to-late adulthood. International Journal of Epidemiology. 2021 Jun;50(3):730-1h.
Uziel L. Rethinking social desirability scales: From impression management to interpersonally oriented self-control. Perspectives on Psychological Science. 2010 May;5(3):243–62..
Wang YC, McPherson K, Marsh T, Gortmaker SL, Brown M. Health and economic burden of the projected obesity trends in the USA and the UK. The Lancet. 2011 Aug 27;378(9793):815–25.
Watson N, Wooden MP. The HILDA survey: a case study in the design and development of a successful household panel survey. Longitudinal and Life Course Studies. 2012 Sep 25;3(3):369–81.
Weir CB, Jan A. BMI classification percentile and cut off points.
Wen M, Kowaleski-Jones L. Sex and ethnic differences in validity of self-reported adult height, weight and body mass index. Ethnicity & disease. 2012;22(1):72.
World Health Organisation. Body Mass Index - BMI [Internet]. 2021 [cited 2021 Dec 23]. Available from: https://www.euro.who.int/en/health-topics/disease-prevention/nutrition/a-healthy-lifestyle/body-mass-index-bmi
Yan T. Consequences of asking sensitive questions in surveys. Annual Review of Statistics and Its Application. 2021 Mar 7;8:109–27.

Improving the accuracy of self-reported height and weight in surveys: An experimental study

Abstract

Background

Methods

Results

Conclusions

Background

Accuracy of self-reported BMI

Explanations for this misreporting

Impact of question wording on responses to sensitive questions

Methods

Participants and procedure

Measures

Hypotheses

Statistical analyses

Results

Hypothesis 1: Survey participants read a preamble prior to being asked their height and weight will report lower heights and higher weights than those not read a preamble.

Hypothesis 2: The impact of question-wording (i.e., preamble vs. no preamble) on self-reported weight will be greater for participants with higher BMIs.

Hypothesis 3: The impact of question-wording on height will be greater for older participants.

Hypothesis 4: Either version of the weight question – standard or “weight-specific”—may result in participants reporting more accurate self-reported weight.

Discussion And Conclusions

Abbreviations

Declarations

Ethics approval and consent to participate

Consent for publication

Availability of data and materials

Competing interests

Funding

Authors' contributions

Acknowledgements

References