Functionality Appreciation Scale ( FAS ) : Item Response Theory Examination


 Background: The present study considers a measure of positive body image, the Functionality Appreciation Scale (FAS), which assesses one’s perception of appreciating, respecting, and honouring the body for what it can do [3]. Differential functioning of the scale across groups (i.e., gender) is yet to be investigated. The present study contributes to this area of knowledge via the employment of Item Response Theory (IRT) analyses. Methods: A sample of 386 adults from Canada, Australia, New Zealand, Ireland, the United Kingdom (UK), and United States of America (USA) were assessed online (N = 394, 54.8% men, 43.1% women, Mage = 27.48, SD = 5.57).Results: The two-parameter logistic model employed to observe IRT properties indicated that all items demonstrated, although variable, strong discrimination capacity. Considering the DIF across men and women, all items demonstrated psychometric invariance across groups indicating that FAS measures FA equally in men and women.Conclusions: The items showed increased reliability for latent levels of ± 2 SD from the mean level of Functionality Appreciation. The implications and interpretations of the findings for clinical practice are discussed.


Introduction
Body image is a multifaceted construct that represents one's cognitions, perceptions, and behaviours relating to one's body [10; 27; 36]. Contemporary research has focused on a unidimensional component of body image by emphasizing a negative connotation, primarily related to mental health treatment seekers [5; 12; 35]. Alternatively, positive psychology, an outlook entrenched in hygiology (the promotion of health; [35]), provides a conceptual structure to guide the study of positive body image as distinct from negative body image.
One area of positive body image that has yet to be explored in depth, is functionality appreciation (FA). Alleva et al. [3] characterised FA as appreciating, respecting, and honouring the body for what it is capable of doing" (p. 29). Given its importance to positive body image [30], FA has been employed in interventions aimed at enhancing positive feelings and attitudes towards one's body (i.e., [1; 2; 11]). Given the lack of su cient measures for assessing FA, Alleva et al. [3] pioneered the Functionality Appreciation Scale (FAS). An exploratory factor analysis adhered to a unidimensional solution with seven items, and a con rmatory factor analysis maintained the construct's unidimensionality and yielded invariance across gender [3]. The FAS has shown adequate convergent, criterion-related, and divergent validity, adequate test-retest reliability, and good internal consistency [3; 23; 28]. Moreover, Alleva et al. [3] outlined that FAS scores gauged good divergent, adequate convergent, and criterion-related validity, indicated by positive and signi cant associations with measures pertaining to body image (i.e., body appreciation), positive self-care (i.e., self-compassion), and psychological well-being (i.e., life satisfaction).
Considering that functionality appreciation is suggested to vary across men and women, previous research has employed measurement invariance (MI) to investigate the stability of FAS psychometric properties across these groups [3; 31; 32; 33; 34]. Results indicate that FAS is invariant across males and females in adult samples from Italy, Malaysia, Romania, the United States of America (USA), and the United Kingdom (UK; [3; 31; 32; 33; 34]). However, item intercept values (i.e., scalar invariance) were different for men and women in samples from the UK and Malaysia [34]. Thus, speci c items (i.e., "I acknowledge and appreciate when my body feels good and/or relaxed") are suggested to function differently for men and women [34].
Overall, the FAS presents seemingly sound psychometric properties using Classical Test Theory (i.e., con rmatory factor analysis, measurement invariance, etc. . Analysing reliability coe cients at the item level can provide greater insights into measurement reliability, enabling a robust evaluation of internal construct and item validity [13; 24]. In IRT contexts, the item-participant relationship is represented by the probability that participants with a certain level of the latent trait (in this case FA) will endorse a particular item [16]. This is graphically represented by the item response function (IRF; [16]) through a nonlinear (logit) regression line. The probability that a participant will respond to a particular item is contingent on several item parameters including di culty (β) and discrimination (α) [ IRT models vary according to the estimated number of parameter logistic (PL; [13]). While Rasch models assume equal α across different items, Graded Response Model (GRM) or Generalised Partial Credit Model (GPCM) assume free estimation of β and α across items [16 ; 24]. Both GRM and GPCM accommodate ordered polytomous data (i.e., Likert scale); however, GRM has been the preferred model as it enables comparisons across non-adjacent categories of responses (i.e., "strongly disagree" and "strongly agree"; [18; 24; 36]).
Additionally, differential item functioning (DIF) can be employed to assess whether different groups (i.e., men and women) respond differently to certain items within a scale (i.e., FAS; [24; 26]). According to Camilli and Shepard [9] there are three reasons why IRT methods are more suitable than CTT methods to detect DIF [15]. Firstly, a graphic illustration of the item characteristic curve (ICC) for each group (i.e., men and women) is exhibited, which in turn, increases the comprehensibility of items displaying DIF. Secondly, statistical properties of items are enhanced through IRT (as opposed to CTT), as this method is able to locate where the item functions differently (i.e., discrimination or di culty). And nally, item parameter estimates derived from IRT are less confounded and in uenced with sample speci c characteristics [15; 24].

Participants
Upon receiving approval from the Victoria University Ethics Committee, participants were recruited online via a crowd sourcing platform (Proli c.co) and were awarded $2.50 each for their time. As part of a larger study, 394 participants completed an online survey including the FAS. Omission of items was not allowed by the Qualtrics-setting parameters. These included 216 men and 170 women, whilst eight participants identi ed as non-binary. These eight participants were excluded in the present analyses targeting gender differences. The remaining participants' age ranged from 18 to 39 years (M = 27.54, SD = 5.58). Only the 386 full responses were utilised for statistical analyses resulting to a maximum random sampling error of .089 for a 95% con dence interval and .117 for a 99% con dence interval. Most participants undergraduate degree (40.4%), reported Caucasian ethnicity (57.8%), lived in the USA (54.9%), were heterosexual (80.5%), and worked full-time (44.3%).

Measures
The FAS [3] contains seven items rated on a 5-point Likert scale ranging from 1 ("strongly disagree") to 5 ("strongly agree"). Scores from each item (i.e., "I appreciate that my body allows me to communicate and interact with others") are averaged, which produces an aggregate score, where higher scores re ect greater functionality appreciation. Table 1 presents a description of the items and descriptive statistics for the current sample. In the present study, the internal consistency of the FAS was acceptable (Cronbach's α = .93, McDonald's ω = .95) and consistent with previous research (Cronbach's α = .86 [3]). was su cient to determine that our GRM represented good t to data [25]. The acceptable t to data provided by the GRM validates that a unidimensional factorial structure appropriately represents the FAS.
Considering item parameters, discrimination power for all items were above the very high range (0 = non discriminative; 0.01-0.34 = very low; 0.35-0.64 = low; 0.65-1.34 = moderate; 1.35-1.69 = high; > 1.70 = very high; 4) between 2.79 (α item 1) and 4.08 (α item 7). The descending sequence of the items' discrimination power is 7, 5, 6, 3, 4, 2, and 1 (see Table 2). Regarding the item di culty parameters (β), there were uctuations between the different thresholds across the 7 items. Indicatively, for the rst threshold the ascending item sequence of di culty was 3, 6, 2, 7, 4, 5, and 1, with item 1 as the most di cult item. Considering the fourth threshold, this alternated to 1, 6, 2, 4, 7, 3, and 5. Nonetheless, the threshold di culty parameters gradually increased between the rst and the last threshold across all items (see Table 2 and Figure 2). In sum, IRT showed that: (i) increasing item scores correctly described increasing levels of FA behaviours across all items, (ii) the rate of these increases differs from item to item, and (iii) different thresholds perform differently from item to item considering their level of di culty. Considering the items' reliability across the different levels of the latent trait, controlling concurrently for the different levels of items' di culty, meaningful variations were con rmed. Indicatively, the IIF of items 5 and 7 provided the highest levels of information/reliability, although with some variability (within 1 SD), in the range between 2 SD above and below the mean. The IIFs of items 2, 4, and 6 showed better performance in the range between 2 SD above and below the mean with signi cant drops in the areas of 3 SD above and below the mean. Items 1 and 3 showed the lowest (yet acceptable) level of reliability in the area between 3 SD below the mean and 2 SD above the mean, with a signi cant drop for behaviours exceeding 1 SD above the mean (see Figure 3). However, a common feature across all items was a reduction in the reliability index after 1 SD.
Considering the performance of the scale as whole, this is visualized by the Test Characteristic Curve (TCC) and the Test Information Function (TIF) gures following. The TCC graph illustrates that the trait of FA inclined steeply, as the total score reported increased (see Figure 3). Considering the information provided by the scale, improved information (TIF) scores were around 1.5 SD below the mean, up to about 1 SD above the mean (see Figure 3). Interestingly, participants with high levels of FAS seemed to endorse the most di cult category (i.e., "strongly agree") on all seven items. This can be visualised in the right side of the TCC curve (Figure 4 left panel) suggesting that this test could not accurately assess participants with 2 SD above the mean. Indeed, reliability indices drop signi cantly after 1.5 SD above the mean (TIF, Figure 4 right panel) as items seem 'too easy' for participants with high levels of the latent trait.
These ndings suggest that the FAS provides a su cient and reliable psychometric measure for assessing individuals in the range between 2 SDs below and 1 SD above the mean. Nevertheless, it may not be an ideal measure for individuals with extremely low (i.e., below 2 SD), or relatively high FA behaviours (i.e., 1 SD above the mean). FA at the levels of 2 SD below and above the mean trait level correspond with mean scores of 13 and 35 respectively, and based on these, it could be suggested as conditional (before clinical assessment con rmation) diagnostic cut-off points [18]. Accordingly, 0% of the participants scored below 2 SD and 30.5% scored above 2 SD of extreme range and over-con dence in FA. However, this consideration needs to be applied with caution as FAS cannot seem to accurately detect signi cant differences for individuals with high FA levels.
Considering DIF of FAS across men and women, sources of non-invariance at the item level were not detected. DIF statistics were observed (see Table 3) for all items, with no signi cant discrepancies across groups (total χ 2 p < 0.05). This indicates that items are invariant across gender groups.

Discussion
The present study is the rst of this type to employ item response theory (IRT) procedures to assess the psychometric properties of the FAS at both the scale and the item level for an English-speaking sample.
Although all items presented with high discrimination capacity, this uctuated according to the following descending sequence of items 7, 5, 6, 3, 4, 2, and 1. Similarly, items' di culty parameters differed across the different item thresholds. Considering the FAS as a whole scale, it seems to perform su ciently and reliably for examining FA levels between 2 SDs below and above the mean. However, this measure of FA may not be ideal for individuals experiencing extremely high FA (scores that lie 2 SD above the mean).
Finally, considering the DIF across men and women, all items demonstrated psychometric invariance across groups indicating that FAS measures FA equally in men and women.
Scale and Item Discrimination, Di culty, and Reliability The ndings from the IRT analysis supported the unidimensionality of the FAS. Considering that IRT principles relate to the identi cation of most appropriate items for the evaluation of a speci c level of a latent trait, items were evaluated and ranked in relation to their discrimination, di culty, and reliability [16]. We considered various aspects of IRT including discrimination, di culty, and information functions across thresholds of the latent trait and considering different levels. Speci cally, most items yielded very high discriminative power apart from four items. The items that yielded high discrimination were, "I respect my body for the functions that it performs" and "I am grateful that my body enables me to engage in activities that I enjoy or nd important". This shows that these two items were most distinguishable between high FA and low FA among groups. Speci cally, clinicians should be more inclined to focus on items pertaining to respectfulness and gratefulness regarding what one's body can do with activities that are enjoyable and important between those experiencing high and low levels of FA among gender.
Further, while the level of di culty of endorsing an item increased between the rst ("strongly disagree") and last options ("strongly agree") of the Likert scale, the sequence of item di culty varied across thresholds. Speci cally, the ascending order of endorsed items between the rst ("strongly disagree") and second ("disagree") options of the Likert scale was 3, 6, 2, 7, 4, 5, and 1. However, the ascending order of endorsed items between the fourth ("agree") and last ("strongly agree") options of the Likert scale was 1, 6, 2, 4, 7, 3, and 5. This suggests that participants felt more inclined to endorse "strongly disagree" or "disagree" being appreciative of what the body is capable of doing and feeling that the body does so much. Alternatively, participants felt more inclined to endorse "agree" or "strongly agree" being appreciative that the body allows them to communicate and interact with others and gratefulness regarding what one's body can do with activities that are enjoyable and important. Therefore, it is proposed that items should be interpreted differently when conducting clinical assessment of FA.
Speci cally, more di cult items (i.e., 3, 6, and 2) should be prioritised with those who have high levels of FA, and easier items (i.e., 7, 4, 5, and 1) should be prioritised when focusing on low levels of FA. Lastly, a common feature across all items was a reduction in the reliability index after 1 SD. Thus, when conducting clinical assessment of FA, the FAS may not be reliable with scores beyond 1 SD--this is because the items within the FAS may not have been challenging enough for the subjects (i.e., ceiling effect; [17]).
As illustrated by the TIF scale, enhanced information performance was evident among ± 2 SDs of the mean. Though, appreciable disparities were observed regarding the amount of information precision displayed by each criterion. Particularly, ndings outlined that item 7 ("I respect my body for the functions that it performs") yielded the strongest information/reliability between ± 2 SD and ± 1.5 SD of the mean. Items 5 ("I am grateful that my body enables me to engage in activities that I enjoy or nd important"), and 6 ("I feel that my body does so much for me") yielded a considerable amount of information/reliability between ± 2 SDs of the mean. Last, items 2 ("I am grateful for the health of my body, even if it isn't always as healthy as I would like it to be") and 1 ("I appreciate my body for what it is capable of doing") consistently provided a low amount of information/reliability among ± 3 SDs of the mean. However, the aforementioned items in company with item 3 ("I appreciate that my body allows me to communicate and interact with others") and 4 ("I acknowledge and appreciate when my body feels good and/or relaxed") yielded the most information among 2 and 3 SDs below the mean. This demonstrates that items pertaining to: (i) "I appreciate my body for what it is capable of doing", (ii) "I am grateful for the health of my body, even if it isn't always as healthy as I would like it to be", (iii) and "I acknowledge and appreciate when my body feels good and/or relaxed", should take precedence when identifying participants with signi cantly low levels of FA.
Additionally, the Test Characteristic Curve (TCC) demonstrated an appropriate steepness indicating that FAS clearly identi es increments in FA as the overall score increases. This favours FAS as a su cient psychometric measure for the assessment of individuals with high and low levels of FA. Nonetheless, the instruments performance signi cantly decreases to differentiate very low (-3 SD) and very high (+3 SD) FA levels.
Moreover, considering the DIF analysis, results revealed that all items did not differ between men and women. This supports the MI analysis from previous literature, where non-invariance as the intercept level between men and women did not differ amongst all items [3; 31; 32; 33; 34].
Overall, the FAS presents as a useful questionnaire for clinicians interested in gaining a ne-grained understanding of body image and appreciation of one's body functionality. Speci cally, considering the high discrimination capacity that FAS items demonstrate, clinicians can adequately assess and identify participants with different levels of body functionality. However, whilst the FAS is a psychometrically sound scale, its use should be complemented with formal assessment (i.e., clinical interviews) or other tests concurrently (i.e., employment of another scale) given the ceiling effect observed within the study. Therefore, clinicians interested in identifying/assessing individuals with signi cantly high levels of FA may wish to explore this in further depth with the use of other tools.

Conclusion, Limitations And Further Research
Firstly, IRT analysis, using a GRM determined that the scale meets the assumptions t to IRT analysis for discrimination and di culty assessment. Following this, we found differing discriminative power across items with "I respect my body for the functions that it performs" and "I am grateful that my body enables me to engage in activities that I enjoy or nd important" as having the strongest degree of discrimination. These items should be considered to differentially assess high and low levels of FA than other items on the FAS. Item di culty also indicated that the scale is most reliable at assessing FA in non-clinical populations, but its reliability decreases as scores deviate from the normative levels, particularly at clinically low levels. Future research utilising the FAS should also consider psychological disorder diagnostics and exclude those meeting clinically signi cant criteria for psychological disorders relating to FA. Alternatively, more discriminative items should be used to assess individuals with an extremely high or low state of FA as outlined in this study. Results reported from this study provide information for clinicians and researchers to determine the appropriate use of the FAS for their population of interest [24].
This analysis compliments and extends upon existing research [3; 31; 32; 33; 34] and is a worthwhile tool regarding increasing the quality of psychological questionnaires and psychological examination. Notwithstanding the unique innovative in uence this study makes to the appraisal of FAS psychometric properties, numerous limitations should be highlighted. The employed sample included adult English speakers from developed countries and may lack a wide generalisability of application to samples involving non-English speakers, youth, and older adults. Additionally, IRT properties may not accurately re ect those experiencing pathological mental illness as a community sample of healthy adults was employed. Future studies may wish to address the shortcomings of the present study to improve and expand upon assessment practices typi ed by the FAS.
Conclusively, the present ndings indicate that FA evaluations and associations within gender based on FAS should be interpreted with caution because of response pattern differences, which affect the metric and the scale properties of the instrument. Moreover, the instrument may not perform well for clinically low and high (+1 SD) FA levels and therefore, its use should be complemented with formal assessment (i.e., clinical interviews). Accordingly, as approximately one third of participants scored above 2 SD and were at risk for presenting FA in the extreme range, further assessment should investigate these underlying causes or traits (i.e., narcissism; [21]) to provide more clarity on excessive levels of heightened FA. Last, items differ considering their suitability to discriminate participants with different levels of the latent trait with certain items.

Declarations
Ethical approval and consent to participate: Ethics approval granted by the Victoria University Ethics Committee (VUHREC). The current study only involved adult subjects (+18 years old) and informed consent was obtained in all cases. All methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication:
Not applicable.
Availability of data and materials: All data generated or analysed during this study are included in this published article [and its supplementary information les]. Data cannot be disseminated/distributed --it must be used to reproduce results only.  FAS Item Characteristic Curves (N = 386). These plots demonstrate how the probability of endorsing a category of FAS items (i.e., "strongly disagree" to "strongly agree") change as levels of the latent trait change.