Obesity is an important risk factors for a wide range of chronic diseases (Ng et al. 2013; Wang et al. 2011; Guh et al. 2009). Despite research demonstrating the limitations of the use of body mass index (BMI) as a measure of body fatness (Shah & Braverman 2012; Rothman 2008; Prentice & Jebb 2001; Burkhauser & Cawley 2008), BMI continues to be used for clinical diagnoses (Cameron et al. 2017; Neermark et al. 2019; Amster et al. 2020) and to estimate population rates of overweight and obesity (Aune et al. 2016; Cole et al. 2007; Hall & Cole 2006), with higher BMI associated with increased risk of obesity-related comorbidities and increased morbidity and mortality (Weir & Jan 2019; Després, 2012; Neelund, et al. 2015). BMI is calculated by dividing a person’s weight in kilograms by their height in metres squared. A BMI of less than 18.5 is considered underweight, between 25 and 30 is categorized as overweight, and over 30 is considered obese (WHO, 2021).
Ideally, height and weight are measured by a clinician, using calibrated instruments such as a stadiometer for height and weighing scales for weight (Biehl et al. 2013). However, self-report measures are often used in large population health studies due to limitations in funding and resources (Turrell et al. 2021; Watson & Wooden 2012). Research comparing self-reported height and weight data with clinical data generally finds discrepancies between the two sets of measurements, with certain groups of people over-reporting height and/or under-reporting weight (Flegal, et al. 2019; Danubio et al. 2008). The result can be underestimation of BMI (Maukonen, et al. 2018; Gosse 2014) and misclassification of individuals as “underweight”, “normal weight”, “overweight” and “obese” (Hodge et al. 2020), leading to lower estimates of obesity prevalence as well as greater random error (Flegal et al. 2019). Formulas designed to correct for this error have been only partly successful (Ayre, 2012; Gosse 2014; Neermark et al. 2019).
Given this reliance on self-report to calculate BMI, it is important to explore ways to gather more accurate data using this approach. One possibility largely ignored in the public health literature is to improve the way the questions about height and weight are asked in surveys. It has long been known in the survey research literature that how questions are asked can have a significant impact on responses (Kalton & Schuman, 1982; Schwarz, 1999). It therefore seems plausible that the accuracy of self-reported height and weight could be improved by asking the questions differently.
The primary aim of this study was to determine, using an experimental design, whether the accuracy of self-reported height and weight data can be increased by improving how these questions are asked in surveys. The findings will contribute to the evidence base on understanding self-reporting bias, and help integrate the literatures that currently exists somewhat separately in the psychological and survey research disciplines.
Accuracy of self-reported BMI
Studies comparing measured and self-reported BMI find that, although the correlations between the two measures are generally high (Davies et al 2020; Hodge et al 2020; Ng 2019), there is a bias towards overreporting of height and underreporting of weight, resulting in an underreporting of BMI (Maukonen, et al. 2018; Flegal et al. 2019) and subsequent misclassification of BMI categories among participants. This systematic error results in misclassification bias, of which there are two types: differential and non-differential. Differential misclassification is related to other study variables whereas non-differential misclassification is not (Rothman 2012:133). Non-differential misclassification is less likely to bias estimates, and tends to produce estimates that are “diluted” or closer to the null. This means that if there is no effect to begin with, non-differential misclassification is unlikely to bias the effect estimate (Rothman 2012:134). Biases from differential misclassification are less predictable, and can either exacerbate or underestimate an effect (Rothman 2012:134). The issue of misclassification bias is particularly pertinent for studies measuring self-reported height and weight: studies in which subgroups have an equal chance of misclassification of BMI categories have more predictable bias, and are less likely to be biased overall.
Existing research suggests that individuals with higher BMIs tend to underreport weight (Hodge et al. 2020; Maukonen et al 2018; Wen & Kowaleski-Jones 2012; Tang et al. 2016), whereas older people tend to overestimate height (Amster et al 2020; Taylor et al. 2006). Thus, misclassification appears to be differential rather than non-differential (Pham et al. 2019). Conclusions regarding the impact of this bias range from slight to significant (Spencer et al. 2002; Glaesmer & Brahler 2002; Engstrom et al. 2003; Kuczmarski et al. 2001; Dhaliwal et al. 2010; Ng 2019; Flegal et al. 2018). Nevertheless, all agree that more accurate data is preferable.
Explanations for this misreporting
To improve the accuracy of self-reported height and weight data, it is necessary to understand why these data are misreported. Whereas the psychological literature has mostly focused on the reporting of traits and attitudes, and the survey literature has emphasised the reporting of behaviours, it appears that similar processes lead to both types of misrepresentation (Tourangeau & Yan 2007).
The most commonly proffered explanation from both the psychology and survey methodology literature is social desirability (Leary 2019). This theory argues that people have a strong desire for others to see them in a positive light. In cultures that favour lower weight and greater height, people may report being taller and weighing less than their actual measurements to promote a more positive picture of themselves to others, such as a survey interviewer (Larson 2000). A recent study supporting this theory found that women’s social desirability score was significantly correlated with the discrepancy between self-reported and measured body weights after adjusting for their actual weight (Luo at al. 2019).
This distorted self-presentation may constitute either a “deliberately deceptive act” (i.e. impression management) or simply a “self-serving estimation error” (i.e. self-deception) (DeAndrea et al. 2012; Uziel 2010; Braun 2001). DeAndrea et al. (2012) argue that one may distinguish between the two possibilities by establishing whether there is the presence of “ground truth” – i.e., knowledge of one’s true height and weight. In other words, if someone knows their actual height and weight, any reported distortion of these data is deliberate, whereas if they are unsure of their actual height and weight, or at least have convinced themselves that they are unsure of their actual height and weight, they may simply report data favourably. This theory suggests that, if one could either determine or enhance ‘ground truth’, accurate reporting of height and weight would be enhanced.
If the theory of social desirability is correct as applied to the self-reporting of height and weight, and people misreport their height and weight in order to influence an interviewer to think better of them, then one solution to this data bias problem would be to remove the influence of interviewers and instead conduct the survey using an anonymous mode, such as online or mail, rather than over the telephone or face-to-face. A considerable body of research, however, finds that in many cases more socially desirable responses are provided to survey questions even when there is no one asking the questions, thus casting doubt on this theory as the sole explanation for the misreporting of height and weight (Gnambs & Kaspar 2017). Krueuter et al (2008), for example, found no differences in responses between interviewer- and self-administered modes for a set of five normative behaviours, including receiving academic honours and donating money to the university. Research by the Pew Research Center (Keeter, et al. 2015) found little difference in the reported frequency of church attendance from participants assigned randomly to a telephone interview or a web survey.
Another possible explanation for bias in self-reported height and weight is based on Identity theory, which concerns what people value and how people view themselves (Stryker and Burke 2000). Rather than providing survey responses to convince the interviewer that they are a worthy person, survey participants may instead be expressing their self-identity as a worthy (i.e. a slightly taller and lighter) person. The participant sees themselves, or wants to see themselves, as healthy, active and attractive, and thus responds to the height and weight questions in a way that more closely accords with this self-view. If someone values being fit and attractive, and views themself as being fit and attractive, they may underreport their true weight and/or overreport their true height as a low-cost opportunity to enact their identity (Brenner & DeLamater 2016). Brenner & DeLamater (2016:337) posit that, rather than being motivated solely by concerns regarding self-presentation, “the respondent pragmatically reinterprets the question to be one about identity rather than behavior, a process influenced by a desire for consistency between the ideal self and the actual self. This pragmatic interpretation of the survey question encourages the respondent to answer in a way that affirms strongly valued identities.” Identity theory, unlike social desirability theory, does not predict that responses to socially desirable questions will be more biased with non-anonymous survey modes (i.e. when another person is asking the questions), but instead predicts greater bias when self-identity does not accord closely with reality. Thus, conventional direct survey questions can prompt the participant to reflect not only on the actual self, but also on their ideal self (Higgins 1987).
Impact of question wording on responses to sensitive questions
It is clear from the survey research literature that how survey questions are asked can impact on responses. This is particularly true for “sensitive” questions, such as illicit drug use, abortion, and sexual behavior, and “intrusive” questions such as household income, although what is considered sensitive or intrusive likely differs by demographic group, cultural background (Johnson & Van de Vijver, 2003) and individual (Tourangeau & Yan 2007).
There is evidence that specifically asking participants to provide accurate information, sometimes referred to as a priming procedure, improves accuracy of sensitive or intrusive survey questions (Tournangeau & Yan 2007). Another promising approach to improving the accuracy of self-reported height and weight is by providing additional assurances regarding the confidentiality of the data, which has been shown to reduce misreporting (Singer et al, 1995 ). Although most surveys provide such assurances at the start of the survey, or as part of the informed consent process, additional reassurance prior to asking the height and weight questions may improve reporting.
Finally, framing effects may be important (de Bruin et al. 2011). Framing refers to the process by which people perceive and conceptualise an issue. Framing effects occur when changes in the presentation of an issue produce changes of opinion (Chong & Druckman 2007). Two subtypes of framing effects are wording and context effects. Context effects refer to influence on survey responses by the context in which a question is asked. Wording effects refers to the language used to ask the question. These effects have been observed on an array of issues (e.g., Petrinovich & O'Neill 1996; Druckman 2001; Feldman & Hart 2018). Although normally discussed in relation to attitudes, framing effects may also be important for other types of survey responses, such as self-reporting of height and weight. Little research, however, has examined its impact on these types of questions.
Magelssen et al (2016), for example, examined the impact of context and wording on support for assisted dying in Norway, by randomly assigning participants to different versions of the same questions. In one version, participants were simply asked whether they agreed or disagreed that physician-assisted suicide should be allowed for persons who have a terminal illness with short life expectancy. The second version added additional information that included an example of a particular patient who ‘is in great pain’, careful consideration by a doctor, and the choice of the patient to ‘avoid great suffering.’ Whereas the first version asks about ‘physician-assisted suicide’ and ‘euthanasia’, the second version uses the phrase, ‘a lethal drug dose that the patient can choose to take to avoid great suffering’. The result is significantly greater support for assisted dying by participants assigned to version 2. Another example of wording effects in the area of economic attitudes finds that expectations and perceptions regarding future inflation rates were lower and less variable when participants were asked about “inflation” as opposed to “prices in general” or “prices you pay” (de Bruin et al. 2012). These effects of context and wording, however, do not always hold. Singer and Couper (2014), for example, conducted an experiment in which they randomly assigned participants to questions about attitudes toward prenatal testing and abortion framed either in terms of “baby” or “fetus”, with the expectation that support would be higher for those assigned to the second version. They found, however, no significant differences by question wording for abortion preferences and small but significant differences for prenatal testing. They did, however, find that question wording made substantial differences in the responses of some demographic subgroups. It may be that attitudes towards abortion, in particular, are so strongly held by many that framing effects have little impact.
Finally, the presence of an authoritative citation, where the question is asked with the addition of an authoritative statement supporting it, has also been shown to affect survey responses – again, mostly on attitude questions (Cocco & Tuzzi 2013). Cocco & Tuzzi (2013), in an Italian study examining the impact of question-wording and context on attitudes towards homosexual behaviour and a possible law against homophobia, found more negative responses with the addition of the following statement: “Silvio Berlusconi has stated that it is better to appreciate beautiful girls than to be gay.” One may argue about the “authoritativeness” of such a statement; nevertheless, the point holds that the statement is attached to a person of authority.
The aim of this study was to determine, using an experimental design, whether the accuracy of self-reported height and weight data can be increased by improving how these questions are asked in surveys, drawing on the relevant evidence from the psychology and survey research literatures. Four hypotheses are tested.