Body image is a multifaceted construct that represents one’s cognitions, perceptions, and behaviours relating to one’s body [10; 27; 36]. Contemporary research has focused on a unidimensional component of body image by emphasizing a negative connotation, primarily related to mental health treatment seekers [5; 12; 35]. Alternatively, positive psychology, an outlook entrenched in hygiology (the promotion of health; [35]), provides a conceptual structure to guide the study of positive body image as distinct from negative body image.
One area of positive body image that has yet to be explored in depth, is functionality appreciation (FA). Alleva et al. [3] characterised FA as appreciating, respecting, and honouring the body for what it is capable of doing” (p. 29). Given its importance to positive body image [30], FA has been employed in interventions aimed at enhancing positive feelings and attitudes towards one’s body (i.e., [1; 2; 11]). Given the lack of sufficient measures for assessing FA, Alleva et al. [3] pioneered the Functionality Appreciation Scale (FAS). An exploratory factor analysis adhered to a unidimensional solution with seven items, and a confirmatory factor analysis maintained the construct’s unidimensionality and yielded invariance across gender [3]. The FAS has shown adequate convergent, criterion-related, and divergent validity, adequate test-retest reliability, and good internal consistency [3; 23; 28]. Moreover, Alleva et al. [3] outlined that FAS scores gauged good divergent, adequate convergent, and criterion-related validity, indicated by positive and significant associations with measures pertaining to body image (i.e., body appreciation), positive self-care (i.e., self-compassion), and psychological well-being (i.e., life satisfaction).
Considering that functionality appreciation is suggested to vary across men and women, previous research has employed measurement invariance (MI) to investigate the stability of FAS psychometric properties across these groups [3; 31; 32; 33; 34]. Results indicate that FAS is invariant across males and females in adult samples from Italy, Malaysia, Romania, the United States of America (USA), and the United Kingdom (UK; [3; 31; 32; 33; 34]). However, item intercept values (i.e., scalar invariance) were different for men and women in samples from the UK and Malaysia [34]. Thus, specific items (i.e., “I acknowledge and appreciate when my body feels good and/or relaxed”) are suggested to function differently for men and women [34].
Overall, the FAS presents seemingly sound psychometric properties using Classical Test Theory (i.e., confirmatory factor analysis, measurement invariance, etc.). However, FAS psychometric properties have yet to be investigated using newer methodologies such as Item Response Theory (IRT).
Item Response Theory (IRT).
Previous literature has indicated that IRT outperforms CTT [13] psychometric estimation for two main reasons [16]. Firstly, whilst CTT explains associations among items and a construct, IRT facilitates examining associations between the construct and individuals with different levels of the latent trait (i.e., item-participant relationships; [6; 13; 14; 19]). Secondly, unlike CTT, IRT can estimate reliability coefficients at the test and item level [16]. Analysing reliability coefficients at the item level can provide greater insights into measurement reliability, enabling a robust evaluation of internal construct and item validity [13; 24].
In IRT contexts, the item-participant relationship is represented by the probability that participants with a certain level of the latent trait (in this case FA) will endorse a particular item [16]. This is graphically represented by the item response function (IRF; [16]) through a nonlinear (logit) regression line. The probability that a participant will respond to a particular item is contingent on several item parameters including difficulty (β) and discrimination (α) [16]. Difficulty (β) specifies the level of the latent trait required where a participant will endorse a specific item or criterion [18; 24]. For example, ‘easier’ items have lower β values and their IRF is displayed closer to the horizontal axis [16]. Discrimination (α) outlines how steeply the rate of a positive response from an individual differs in accordance with their level of the latent trait [18]. Thus, items that are more robustly associated with the latent variable show steeper IRF functions, in turn accurately discriminating different levels of the latent trait (i.e., FA; [24; 36]).
IRT models vary according to the estimated number of parameter logistic (PL; [13]). While Rasch models assume equal α across different items, Graded Response Model (GRM) or Generalised Partial Credit Model (GPCM) assume free estimation of β and α across items [16; 24]. Both GRM and GPCM accommodate ordered polytomous data (i.e., Likert scale); however, GRM has been the preferred model as it enables comparisons across non-adjacent categories of responses (i.e., “strongly disagree” and “strongly agree”; [18; 24; 36]).
Additionally, differential item functioning (DIF) can be employed to assess whether different groups (i.e., men and women) respond differently to certain items within a scale (i.e., FAS; [24; 26]). According to Camilli and Shepard [9] there are three reasons why IRT methods are more suitable than CTT methods to detect DIF [15]. Firstly, a graphic illustration of the item characteristic curve (ICC) for each group (i.e., men and women) is exhibited, which in turn, increases the comprehensibility of items displaying DIF. Secondly, statistical properties of items are enhanced through IRT (as opposed to CTT), as this method is able to locate where the item functions differently (i.e., discrimination or difficulty). And finally, item parameter estimates derived from IRT are less confounded and influenced with sample specific characteristics [15; 24].