Participants and procedure
Participants were invited to participate in the study using SoSciSurvey  as an online survey consisting of various questionnaires. The link was posted on social media platforms and online cancer support groups as part of a validation study . All participants gave their informed consent online. Inclusion criteria were: age ≥ 18 years and current or in the past cancer diagnosis. Exclusion criteria were not defined. In total, N = 350 cancer patients (283 women (80.9%), 66 men (18.9%), 1 gender diverse (0.3%)) completed the 12-items version of the WHODAS 2.0 were included.
We received the permission of WHO for utilization of the WHODAS 2.0 (License: CC BY-NC-SA 3.0 IGO). All procedures contributing to this work comply with the relevant national and institutional committees' ethical standards on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. The work was approved by the Ethics Commission of the University's Faculty of Medicine (reference number 18-098).
WHODAS 2.0. Global health status was assessed using the German version of the 12-item self-administered version of the WHODAS 2.0 . The scale is an established and validated tool for the assessment of functioning difficulties in six domains (understanding and communicating, mobility, self-care, getting along, life activities, and participation). The participants estimate how many difficulties they have had in performing various activities in the last 30 days on a 5-point Likert-type scale (none = 0, mild, moderate, severe, extreme/ cannot do = 4). Higher scores reflect a more significant disability .
HADS. Self-reported distress was assessed using the German version of the Hamilton Anxiety Depression Scale (HADS) . The scale is an established tool for the assessment of anxiety and depression in cancer patients [22,23]. The HADS consists of 14 items with a total score (HADS-T) ranging from 0-42. Subscale scores for depression and anxiety may additionally be calculated. Higher scores on the HADS indicate more severe depression and anxiety. To identify patients with an increased need for psycho-oncological care and especially for depression symptoms in cancer patients, a sum score of HADS-T ≥ 15 can be used as the cut-off value .
Data were analyzed using SPSS version 26.0  and RUMM2030 software . Patients’ characteristics are described by means and standard deviations. One item is missing from one patient of the 12-item WHODAS version, which was replaced by using the mean of the other items as recommended by Üstün et al. .
IRT methodology was used to assess the psychometric properties of the WHODAS 2.0 in an oncological context. IRT models, including Rasch models, allow a nuanced analysis of an instrument's psychometric properties because they focus on the items and how persons respond to them. Person parameters are estimated, which express the individual extent of a latent trait, which in the case of WHODAS 2.0 is disability . Likewise, on the same latent trait, the item difficulty parameters are estimated. ‘Easy’ WHODAS-items would be items that are already scored high in the direction of disability by patients with only minor disabilities, whereas ‘difficult’ WHODAS-items would be items that are only scored high by patients with major disabilities. During the process of the item analysis according to the Rasch model, it is tested whether patients respond as expected to each item. For example, a patient with major disabilities should also score high on an ‘easy’ WHODAS-item. In order to properly test the fit of the WHODAS-data to the Rasch model, this paper follows the current state-of-the-art Rasch analysis requirements  and the CREATE guidelines for reporting valuation studies .
Given the polytomous WHODAS-items, the Partial Credit Model (PCM)  was used. According to the Rasch model, performing analysis comprises the investigation of how well the data meets the expectations of the measurement model, i.e., unidimensionality, local independence, and absence of differential item functioning (DIF). In this sense, the analysis according to the Rasch model can be understood as an iterative process in which potential deviations from the model’s expectations are investigated and – if possible – resolved.
One fundamental requirement of the Rasch model is unidimensionality, i.e., the items of a scale should assess only one underlying construct. Unidimensionality was tested with principal component analysis (PCA) of the residuals . The idea is to use the items with the highest negative/positive loadings on the first residual factor to create two subsets of items. The separate person estimates of these two subsets are used to identify significant differences using independent t-tests. The proportion of significant t-tests should not exceed 5% to confirm unidimensionality .
Another assumption is that of local independence. This implies that there should be no residual correlations between items when extracting the trait factor . Locally dependent items respectively items which are linked in some way, can lead to overestimation of reliability, parameter estimation bias, and problems with construct validity . Following the recommendations of Christensen et al.  and Marais , a cut-off value of 0.2 above the average residual correlation was used to assess local dependence (LD). There are different strategies to deal with LD, such as combining the locally dependent items to testlets by adding them together.
One more assumption is that there is no item bias with regard to exogenous variables (no DIF). If DIF is given, the difficulty of an item is different for different groups (e.g., males and females). In other words, in different groups, the corresponding item indicates the latent characteristic in different ways . We tested the items for DIF by looking at gender (woman, man), age (median split of the sample: below and above 54), type of cancer (breast, other forms of cancer, multiple cancers), presence of metastases (yes, no, unknown), psycho-oncological support (yes, no) and duration of disease (median split of the sample: below and above 3.9 years). In case of DIF, we evaluated the impact of DIF by computing equated scores . Due to too small group sizes, we had to exclude the one gender diverse person for the DIF analysis of gender and combine the residual cancer types into one category, 'other forms of cancer' for the DIF analysis of cancer type.
Additionally, item fit as indicated by standardized residuals within a range of ± 2.5 and overall model fit indicated by a non-significant Chi-Square probability p > 0.01, were investigated [31,36]. Moreover, the ordering of item thresholds was analyzed. Item thresholds are the transition points between two adjacent respond categories.
The scale's internal consistency was estimated using Person Separation Index (PSI). The PSI is equivalent to Cronbach's alpha and can be interpreted similarly with a requirement of a minimum value of .7 for group and .85 for individual use . Targeting was assessed graphically based on the person-item threshold distribution graph. Person-item maps demonstrate how person parameters and item thresholds are distributed along the trait dimension.