Establishing the optimal male cut-off point: confirmatory factor analysis of the eating disorder examination-questionnaire (EDE-Q) in a representative sample of Spanish university students

Although the EDE-Q is derived from the “gold standard” for the assessment of eating disorders (ED), its factor structure is controversial, particularly in male samples. The aim of the study was to examine the psychometric properties and factor structure of the EDE-Q, as well as to establish a sensitive and specific cut-off point validated by EDE clinical interview. A series of Confirmatory Factor Analyses were performed among a representative sample of 796 male university students, of whom 139 were interviewed. Sensitivity and specificity were calculated by receiver-operating characteristic (ROC) analysis to determinate the most appropriate cut-off value. The original factor structure was not confirmed, showing a better fit with a 2-factor solution. For the Spanish male sample, a cut-off ≥ 1.09 for at-risk of ED cases and ≥ 2.41 for clinical cases presents an optimal balance between sensitivity and specificity. The establishment of specific cut-off points for males may help to reduce the under-diagnosis of ED in this population. III: evidence obtained from well-designed case–control study.


Introduction
The eating disorder examination questionnaire (EDE-Q) [1] is derived from the "gold standard" for the assessment and diagnosis of eating disorders (ED), the eating disorder examination interview (EDE) [2]. In addition to the cognitive features of ED, the EDE-Q includes items that assess behavioral symptoms as self-induced vomit, binge eating or excessive exercise, and is a brief, cost-efficient selfreport measure for exploring possible occurrence and severity of ED [3]. The EDE-Q has been widely used and has shown to be a valid and reliable instrument in both general population and clinical samples [4][5][6].
The original theoretical structure of the EDE-Q is composed by four factors (i.e., restraint, eating concern, weight concern and shape concern), and a global score. However, this structure has found little empirical support and is controversial [6,7]. Since its development, the factor model of the EDE-Q has been tested through exploratory factor analysis (EFA) and Confirmatory Factor Analysis (CFA) in multiple studies with samples differing in nationality, age, sex, sexual orientation or clinical status [6,8]. In this regard, Rand-Giovanetti et al. [7] recently conducted a comprehensive review of 24 empirical investigations exploring the factorial structure and psychometric properties of the EDE-Q. Of all reviewed studies, only two confirmed Fairburn's original four-factor, 22-item factor structure. There are 12 1 3 different factor structures that retain the original 22 items, from unifactorial to alternative 4-factor proposals, as well as nine different factor structures for reduced-item models [7]. Factor reduction commonly involves merging the "Concern" subscales in some way [6].
In recent years, ED behaviors are growing faster in men than in women [9]. Although the rates of anorexia nervosa and bulimia nervosa are lower than in women [10], the dual nature of male body dissatisfaction (i.e., low body fat and high muscularity) lay them at risk for disordered eating [9,11], with the average age of onset being around 19-20 years [12,13]. Despite the need for robust instruments to explore this pathology among men and reduce the problem of underdiagnosed of male ED [9], research on the psychometric properties of EDE-Q in men remains low and samples are generally scarce [6]. Recent studies on male samples show different factor structures for the EDE-Q [8], supporting the hypothesis that the original factor model is not sensitive enough for males, and more research is needed to clarify its weak and unstable structure [5,14].
Screening for low prevalence pathologies such as ED inherently involves challenges [15], particularly in male samples where results with the usual cut-off points may not be adequately detecting at-risk cases [16,17]. Sensitive and specific screening instruments validated with clinical interview are essential. In the Spanish population, studies that have examined the psychometric properties of the EDE-Q in males, including EFA and CFA, have been conducted mainly in adolescent samples [18,19]. The few studies that have collected adult samples have not conducted a clinical interview to contrast the findings [16,20].
This wide range of alternative EDE-Q versions may hinder communication between researchers and clinicians and affect the interpretation of findings [21]. The aim of this study is to extend the current literature by examining the psychometric properties and factor structure of the EDE-Q, as well as to determine/establish a sensitive and specific cut-off point validated through EDE clinical interview in a large non-clinical representative sample of Spanish university men.

Participants
Data were obtained in an epidemiologic study of the prevalence of ED and Muscle Dysmorphia (MD) among male university students at the Autonomous University of Madrid using a two-phase study with a control group. In the first phase, the sample was screened by questionnaire to detect students at risk of developing an ED or MD. In the second phase, clinical interviews were conducted with the risk group who were above a cut-off point ≥ 4 on the EDE-Q-Global [1,22]. The control group was selected randomly from the students who scored below of this cut-off point, using two students pairing stratified by academic year and school. A total of 139 EDE interviews were conducted.
The survey was carried out in a sample of male students enrolled in the first and fourth academic years, between 2016 and 2019 academic course. Among the 21 schools on the campus, the five schools (i.e., physical activity and sports sciences, physics, computer science engineering, business administration and management, and economics) with the highest number of male students enrolled (i.e., over 70%) were selected. A total of 1634 students were targeted. To achieve a representative sample of the university campus by academic year and school, the sample design was proportionally stratified according to both variables, assuming a 95% confidence interval and 0.05 of sampling error. A total of 1088 students was identified as the desired sample size [23]. The final sample collected consisted of 850 Spanish male university students (i.e., 78,1% response rate): (1) physical activity and sports sciences from polytechnic (n = 297, 91.1% response rate), (2) physics (n = 92, 96.8% response rate), (3) economics (n = 171, 77.7% response rate), (4) computer science engineering (n = 114, 49.6% response rate), and (5) business administration and management (n = 176, 81.1% response rate). The mean age of the sample was 19.8 (SD = 2.8).
Specifically for this study, to examine the psychometric properties of EDE-Q, 54 participants demonstrated > 5% of missing data on the questionnaire or were non-Spanish students and, therefore, were excluded (n = 796). Overall, 528 (66.3%) participants practiced sports at a non-competitive level (recreational exercisers) and the remaining 268 (33.7%) students competed at a national or international level in some sport (e.g., football, swimming, etc.). The mean body mass index (BMI) was 22.4 (SD = 2.9).
Tests were administered at the classroom collectively and completed individually in electronic or paper form. The interviews were conducted face-to-face by the research team in spaces provided by the University. Permission to conduct the study was granted by the university's deans and the participant's teachers.

Measures
In addition to self-reported sociodemographic data on age, weight and height, participants also responded to: Eating disorder examination (EDE-12, Version 12.0) [2,24] is a clinical interview developed to measure ED psychopathology over a 28-day period. It comprises 35 questions in which 22 items comprises four subscales, with the same response format of seven-points (i.e., 0-6): Restraint, Eating Concern, Shape Concern, and Weight Concern.

3
The EDE-Global score is obtained by averaging subscales' scores. The presence and frequencies of key ED behaviors are not included neither subscales nor global score, however, these questions are used for coding the ED diagnosis. The interview manual indicates the diagnostic items of the EDE to establish a DSM-IV diagnosis, which has been adapted to DSM-5 diagnoses.
Eating disorder examination-questionnaire (EDE-Q) [1,16], is a 28 item-measure that asks directly about attitudes related to key features of ED psychopathology in a 28-day time frame. The same four subscales of the EDE interview are calculated through 22 attitudinal items, and responses are given on a 7-point Likert-type scale (0 = never; 1 = 1-5 days; 2 = 6-12 days; 3 = 13-15 days; 4 = 16-22 days; 5 = 23-27 days; 6 = every day). The EDE-Q-Global score is obtained by averaging subscales' scores. The remaining items assess the frequency of specific eating behaviors that are not included in the subscales or in the global score (e.g., binge eating, self-induce vomiting, etc.). Studies of convergent validity comparing the EDE-Q with its interview equivalent (the EDE) have generally demonstrated good agreement between measures [25][26][27]. The Cronbach's alpha was 0.93 for the EDE-Q-Global score for non-clinical men [10]. The EDE-Q has a clinical significance cut-off point of ≥ 4 for EDE-Q-Global score for both sexes [22]. In the current sample, the Omega coefficient was 0.93 for the EDE-Q-Global score.
Muscle dysmorphic disorder inventory (MDDI) [28,29] is a 13-item questionnaire with a response range from 1 (never) to 5 (always) that evaluates body dissatisfaction from a male perspective related to muscle development. Likewise, the MDDI is divided into three subscales: drive for size (DFS), appearance intolerance (AI), and functional impairment (FI). The original version showed adequate reliability indexes (α = 0.77-0.85), as well as the Spanish version (α = 0.73-0.85). In the current sample, the Omega coefficient was 0.89 for the MDDI total score.

Data analysis
Statistical analyses were carried out using SPSS 25.0, Mplus 7.11, and RStudio, employing the MNV [30], psych [31], and ROCit [32] packages. Descriptive statistics (M ± SD) were calculated for all scale scores. The Mardia's test revealed that the EDE-Q items did not follow a multivariate normal distribution (skewness = 38,873.39, p < 0.001; kurtosis = 256.93, p < 0.001). Since data were categorical and followed a non-normal distribution, CFAs were performed using robust unweighted least squares (ULSMV), which is an efficient method to deal with this kind of data (e.g., see [33]). Items were set to load freely, except for one item per factor, which was set to 1 to ensure an identified model. The models under investigation were as follows: Model I: four-factor structure [1]; Model II: three-factor structure that retains two EDE-Q subscales (Restraint, Eating Concern) but collapses weight and shape concern items [34]; Model III: two-factor model that retains one EDE-Q subscale (Restraint) but collapses Eating, weight and shape concern items [35]; and Model-IV: a unidimensional model for all 22 EDE-Q subscale items [7]. Several fit indexes were considered. Root mean square error of approximation (RMSEA) and its 90% confidence interval, Tucker Lewis Index (TLI), and comparative fit index (CFI). Values of CFI and TLI values close to 0.90, and RMSEA values < 0.08 were indicative of good fit [36,37]. A Chi-square difference test (Δχ 2 ) was used to compare models. Given the Likert-type nature of the EDE-Q [38], internal consistency was assessed through the Omega coefficient [39]; values of ≥ 0.80 were considered adequate [40]. Kolmogorov-Smirnov tests showed that all scale scores were not normally distributed. Therefore, their concurrent validity was assessed using Spearman correlations with the MDDI scores. We also carried out sensitivity and specificity analyses for the EDE-Q total score with 139 interviewed males with the EDE. We used receiver-operating characteristic (ROC) curves to determine the optimum cut-off score for males with ED diagnosis and risk of ED according to EDE interview criteria, using Youden's index, which indicates the balance between sensitivity/specificity. We estimated the Area Under the Curve (AUC) to assess the discrimination quality. In general, AUC = 0.70-0.80 are considered acceptable, AUC = 0.80-0.90 are considered as good, and AUC = 0.90-1.00 as excellent [41]. Finally, we calculated the sensitivity as the true positive rate, the specificity as the true negative rate, the positive predictive value (PPV), and the negative predictive value (NPV).

Results
The first CFA (Model I) tested the fit of the original theoretical proposal [1]. Results of this analysis revealed a not positive definite matrix solution, indicating that this model was not acceptable. Fit statistics for the remaining models (II-IV) are presented in Table 1. Model IV showed poor fit to our data, while models II and III resulted in acceptable fit. The Chi-squared difference test (Δχ 2 ) revealed that Model-II showed better fit than Model-I. However, although Δχ 2 indicated that Model-II showed better fit than Model-III, the difference in the remaining fit indexes was small (e.g., ΔTLI < 0.01 [42]). Thus, we retained Model-III, as it was the model with better fit/parsimony balance (see Fig. 1). This retained two-factor model shows positive significant moderate to high correlations with the EDE interview (see Table 2).
Mean scores, standard deviations, and internal consistency, as well as correlations among EDE-Q and MDDI scores, are presented in Table 3. The EDE-Q and its subscales showed acceptable to excellent internal consistency, with omega coefficient values above 0.70 for the Restraint subscale, and above 0.90 for the remaining scales.
Regarding the concurrent validity, the MDDI total score showed positive moderate significant correlations with the EDE-Q and its subscales (0.35-0.52). There were also positive moderate to high correlations between the MDDI-AI subscale and the EDE-Q, especially with the EWSC subscale (ρ = 0.62; p < 0.01) and the total score (ρ = 0.60; p < 0.01). On the other hand, the EDE-Q showed low to moderate positive correlations with the MDDI-DFS and the MDDI-FI subscales (see Table 3).
ROC curves are presented in Figs. 2 and 3. The figures show the optimum cut-off scores for males with ED diagnosis and at risk of ED, respectively. As shown in both figures, these cut-off points were determined using Youden's index. The cut-off score for ED diagnosis males was an EDE-Q-Global score = 2.41, whereas for risk-of-ED males it was an EDE-Q-Global score = 1.09.
Descriptive statistics for each group and AUC are presented in Table 4. As shown, AUC results indicate that the probability for an ED male to obtain a higher EDE-Q-Global score than control males is 76.9%; whereas, the probability for a risk-of-ED male to obtain a higher EDE-Q-Global score than control males is 77.4%. These values are considered acceptable. Sensitivity, specificity, PPV and NPV values for the optimum cut-off scores are presented in Table 5, alongside the values for the classic EDE-Q diagnostic score (i.e., EDE-Q total ≥ 4.00).

Discussion
To our knowledge, this is the first study to examine the factor structure of the questionnaire EDE-Q, contrasting the findings with the EDE interview in a representative university sample of Spanish men to establish a cut-off point for optimal ED detection in this population.
In the present study, the EDE-Q shows good internal consistency, with high Omega coefficient values for both the total scale and the two subscales of the EDE-Q, similar to those obtained in other studies [5,17]. Therefore, the EDE-Q is shown to be a valid and reliable instrument for use as a screening tool in Spanish males.
Consistent with other studies [7,43], the original fourfactor structure of the EDE-Q was not confirmed. For the Spanish male sample, the EDE-Q showed a better fit to a two-factor solution with a Restraint subscale and a Weight-Shape-Eating Concern subscale, without removing any items [35]. Both the EDE interview and the EDE-Q were constructed on a rational basis to represent the key psychopathology of eating disorders. Subsequent factor studies, however, mostly do not support the initial structure [1]. These first theoretical approaches are based on a female-centric approach that may not fit the male perspective in which cognitive aspects seem to belong to a single dimension of body image concern.
Given the dual nature of male body dissatisfaction and its associated behaviors, it is suggested that the EDE-Q be used in males in conjunction with more specific measures of male body reality [44]. The EDE-Q shows moderate to high convergence with the MDDI, indicating an overlap between ED and MD symptomatology, except for the Restraint subscale which shows a lower association with the Drive for Size subscale of the MDDI. This difference is not surprising, as this subscale is aimed at exploring the desire for muscle mass gain, for which dietary restriction is counterproductive [9,11].
The EDE-Q mean scores observed in our sample are consistent with those obtained in research with similar male samples in Spain [16,19] and in other countries [43][44][45][46][47]. In general, men score lower than women in studies using the EDE-Q [15,20,45,47,48]. However, this does not necessarily imply that there is no ED symptomatology in males, so it is systematically questioned whether the cut-off points established for the questionnaires imply a risk of under-diagnosis [9,46]. This risk is particularly salient when exploring Table 1 Fit index values for the tested models (n = 796) All models were tested on all participants (n = 796). Fit indexes for Model I are not presented, as this model was deemed unacceptable due to a not positive definite matrix male samples, where body image concerns and behaviors differ from those of women, and the difficulty of detecting at-risk cases is greater [9,11]. The use of a proposed initially cut-off ≥ 4 [22] as a marker of clinical significance has been criticized in the literature, suggesting downward rectification for both female [20] and male samples [16]. In fact, studies using ROC curve analyses contrasting EDE-Q scores with EDE interview scores point in this direction. In female samples there is a variability of cut-off proposals for the EDE-Q (EDE-Q-Global Score range: 1.98-2.80) [15,[49][50][51], all below the Carter et al. [22] proposal, including  [46] is also far from the original proposal. The analysis performed in the present study suggests that, for the Spanish male sample, a cut-off ≥ 1.09 for at-risk of ED cases and ≥ 2.41 for clinical cases presents an optimal balance between sensitivity and specificity.
In conclusion, there are gender differences in levels of eating pathology that are indicative of clinical concern [46]. However, most research using the EDE-Q, including many in recent years, continues to use cut-off ≥ 4 in males [3,5], leaving significant numbers of potentially at-risk participants undetected and, therefore, untreated. In men, body image and eating pathology is more complex and the EDE-Q is limited in detecting muscle-oriented eating risk behaviors. In this sense, the development and further examination of a modified muscle-oriented version of the EDE-Q [52] that captures the domains of disordered eating relevant to males may be promising.

Strengths and limitations
The main strength of the study is its large sample size and the representativeness of the sample of undergraduate male students of Spain. Conducting clinical interviews in research is an indispensable requirement to contrast the results of the questionnaires and establish a correct diagnosis. However, its high cost makes it difficult to carry out, so the high number of clinical interviews conducted is another important strength of the study. However, results of this study should consider some limitations. The fact that a significant percentage of the sample regularly engage in sport or recreational exercise may affect the results, which may not be generalizable to samples of less physically active adult males. In this sense, although the sample provided level-sport data, no invariance studies have been carried out in this respect. Also, no data were collected on the ethnic or sexual diversity of the participants, so the results obtained in university students do not allow generalization of the results to other samples of males.

What is already known on this subject?
The factor structure of the EDE-Q has been explored in different samples with contradictory results. Particularly in the male population, the interview-based cut-off point is not sensitive for males, aggravating the problem of underdiagnosis.

What does this study add?
Our study explores the factor structure of the EDE-Q in a large representative sample of males, who also participated in a clinical interview. Our results provide the scientific community with a sensitive and specific cut-off point of the EDE-Q for males and represent a potential advance in the detection of ED in Spanish-speaking males.
Funding No funds, grants, or other support was received.

Conflict of interest
The authors declare that they have not conflict of interest.
Ethical approval Ethical approval was obtained from the Research Ethics Committee of the Autonomous University of Madrid (UAM, CEI-75-1368). All procedures performed in this study involving human participants were in accordance with the ethical standards and with de Helsinki Declaration and its later amendments or comparable ethical standards.
Informed consent Written informed consent was obtained from all the surveyed participant.