Validation of the Simplied Chinese Version of FACT-Hep for Patients with Hepatocellular Carcinoma Based on Combinations of Classical Test Theory and Generalizability Theory

Quality of life (QOL) is now concerned worldwide in cancer clinical elds and the specic instrument FACT-Hep (Functional Assessment of Cancer Therapy- Hepatobiliary questionnaire) is widely used in English-spoken countries. However, the specic instruments for hepatocellular carcinoma patients in China were seldom and no formal validation on the Simplied Chinese Version of the FACT-Hep was carried out. This study was aimed to validate the Chinese FACT-Hep based on Combinations of Classical Test Theory and Generalizability Theory.

year survival being less than 20% [6][7]. Therefore, researchers and clinicians tend to pay more attention to quality of life (QOL) of patients with HCC because of long course and di culty in curing of the disease.
In the recent 20 years, the assessments of QOL have been applied as signi cant outcomes for patients with HCC [2,[8][9][10]. And thus the several speci c instruments have been developed such as the Functional Assessment of Cancer Therapy (FACT) Hepatobiliary (FACT-Hep) questionnaire [11,12], the QLQ-HCC18 from European Organization for Research and Treatment of Cancer (EORTC) Quality of Life Group [13,14], the National Comprehensive Cancer Network Functional Assessment of Cancer Therapy (NCCN-FACT) Hepatobiliary-Pancreatic Symptom Index (NFHSI) [15], the Nine-Item Chinese Patient Satisfaction Questionnaire (ChPSQ-9) [16], and the Quality of Life Instrument for Patients with Liver Cancer QOL-LC [17,18]. Of them, the FACT-Hep is a 45-item self-report instrument developed speci cally to measure QOL in patients with not only HCC but also pancreatic, biliary and metastatic liver cancer [11,12].
There is very few QOL instruments developed and applied among Chinese general population, which is the largest population in the world. In particular, the QOL instruments developed for liver cancer is scarce. Therefore, the Simpli ed Chinese version of FACT-Hep (V4.0)for HCC was developed by the Center on Outcomes, Research and Education (CORE). However, the formal validation of this scale was not carried out when used in mainland China. Our study is aimed to evaluate the psychometric properties, especially the validity, of FACT-Hep (V4.0 for HCC in mainland China. Due to the lack of a gold standard, we used the QLICP-LI (Quality of Life Instruments for Cancer Patients-Liver cancer) developed by our research group [19] in the present study in order to evaluate the criterion-related validity. The QLICPs (Quality of Life Instruments for Cancer patients) is a Chinese QOL instruments system developed by module approach with a general module (QLICP-GM) being used for all types of cancer, and different speci c modules for different cancers [20][21][22][23][24]. For example, the QLICP-CR is formed by combining the QLICP-GM and the speci c module of the colorectal cancer [21]. Similarly, the QLICP-BR, QLICP-HN, QLICP-LU are formed by QLICP-GM and the speci c modules of the breast cancer, head and neck cancer, and lung cancer respectively [22][23][24].

Instruments And Scorings
Like the original one, the simpli ed Chinese version of FACT-Hep (V4.0)consists of two parts: the general module on all cancers (FACT-G) and the additional concerns on HCC-speci c module (HCS). The FACT-G includes four domains, i.e. physical well-being (PWB, 7 items), social/family well-being (SFWB, 7 items), emotional well-being (EWB, 6 items), and functional well-being (FWB, 7 items). It assesses symptoms and other QOL concerns. The HCS is an 18-item, disease-speci c hepatobiliary cancer subscale. It assesses back and stomach pain, gastrointestinal symptoms, anorexia, weight loss and jaundice in patients with hepatobiliary cancers.
Each item in FACT-Hep is rated in a ve-point Likert-type scale (from 0-4 points) and gotten directly score of 0-4 for positively-stated items. The scores of negatively-stated items were reversed. We summed up the scores of items under speci c domains to obtain the domain score, and then summed up the ve domain scores to obtain the overall scale score [11,12]. In this way, the higher score indicates better QOL.
The structure and scoring method of QLICP-LI(V2.0) is very similar to FACT-Hep [19], which consists of a general module (QLICP-GM) and a 12-items disease-speci c domain (SLI1-SLI12). The QLICP-GM includes four domains with 32 items: physical 8 items (GPH1-GPH8), psychological 9 items (GPS1-GPS9), social 8 items (GSO1-GSO8), and common symptoms and side-effects of cancers 7 items (GSS1-GSS7). The domain score was obtained by summing up the score of each item in this domain, and the overall score was obtained by summing up the ve domain scores. All domains/overall scores of the instrument were linearly converted to a 0-100 scale using the formula: SS=(RS-Min) × 100/R, where SS, RS, Min and R represent the standardized score, raw score, minimum score, and range of scores, respectively. Similar to FACT-Hep, higher score indicates better QOL in QLICP-LI(V2.0).

Data Collection
This study recruited inpatients diagnosed with HCC at the Yunnan Tumor Hospital (the third a liated hospital of Kunming Medical University). The study protocol and the informed consent form were approved by the IRB (institutional review board) of the investigators' institutions and the hospital. The inclusion criterion is that the patients were capable to read and understand the questionnaires because the questions were about their self-perceived and subjective evaluation of QOL. The investigators explained the study and scales to the HCC inpatients at any stages and treatments. Finally, 114 HCC inpatients who met with the criteria gave their consent to participate in our study.
Participants completed the questionnaire when they were admitted to the hospital, and discharged after approximately 4 weeks of treatments to evaluate the responsiveness of questionnaire. Some patients with stable disease course were asked to complete the questionnaire again in one or two days after hospitalization to assess test-retest reliability of the simpli ed Chinese version of FACT-Hep.

Data analysis
The validity, reliability, and responsiveness of the simpli ed Chinese version of the FACT-Hep were evaluated.
The construct validity indicates the scale structures of FACT-Hep. In the present study, we evaluated item convergent and discriminant validity, which represented the construct validity, by using the multi-trait scaling analyses. [25] Pearson correlations was applied to exam the item-domain (subscale) correlations. The correlations were interpreted according to these two criteria: (1) convergent validity is supported when an item-domain correlation is 0.40 or greater; (2) discriminant validity is revealed when item-domain correlation is higher than that with other domains.
The criterion-related validity was evaluated by correlating corresponding domains of the two instruments FACT-Hep and QLICP-LI for lack of gold standard. Relatively high correlations among conceptually-related domains and relatively low correlation among conceptually-distinct domains would suggest high criterion-related validity. Also these correlation analyses with QLICP-LI can reveal convergent and discriminant validity to some extent.
The internal consistency of each domain was estimated by Cronbach's alpha coe cient. It was calculated based on data collected at the rst measurement because of the relatively large sample size. If an alpha coe cient is greater than 0.7, it indicates an acceptable reliability [26]. The test-retest reliability is de ned as the absolute for a single measure under two-way mixed model [27] between the rst and second assessments, assessing by the Pearson's correlation and intra-class correlation (ICC).
Besides, we also applied Generalizability Theory (G theory) to investigate the score dependability of the FACT-Hep, which addresses the dependability of measurements and the simultaneous estimation of multiple sources of variance including interactions [28][29][30]. We employed a random effects design for both the G-study and D-study in measurement mode to estimate the variance components and dependability coe cients using a one-facet crossed design: persons (p) by items (i), represented as p x i, where the patients as the object of measurement and not a source of error and not considered a facet, but the items as one facet of measurement error. The variance components of generalizability coe cients (Gcoe cients) and dependability indexes (Φ-coe cients) in each facet (items in this paper), as well as their interactions were calculated for the G-study. In the D-study, coe cients (G-coe cients and Φ-coe cients) were calculated from the object of measurement (p) and items (i), with G-coe cients being used to determine the ratio of universe-score variance to expected observed-score variance, while Φ-coe cients being calculated for absolute decisions. The different designs were created through changing measurement facet for items (p x I ) for the D-study.
With regard to internal responsiveness, it was assessed by comparing the mean difference between the rst and third assessments (pre-treatment and post-treatment) by paired t-tests. We calculated the standardized response mean (SRM) and effect size (ES) [31,32]. SRM is the difference of the score between pre-treatment and post-treatment divided by its standard deviation and the ES is divided by the pre-treatment standard deviation.

Socio-demographic And Clinical Characteristics Of Participants
The total sample included 114 inpatients with HCC. The mean age was 51 years (SD: 10; range: 31-73 years). About 80% of patients were male; the majority (96.5%) were married; 80% with Han ethnic background and more than 70% of patients had hepatitis history. The distributions of occupations were worker 14 cases (12.3%), farmer 41 (36.0%) and others 59 (51.7%). Regarding the educational level, 37 (32.5%) patients nished primary school, while 64(56.1%) completed high school of professional secondary school, and 13 (11.4%) had a college degree. Regarding the treatments, 16 cases (14.0%) had surgery; 69 (60.5%) had minimally invasive treatments; and 29 (25.5%) had other treatments. Table 1 shows the correlation between items and domains of the FACT-Hep. All correlation coe cients r were higher than 0.40 and most of them higher than 0.60 (exception of a few items of HCS such as Hep1, Hep2, Hep4 with HCS domain), which indicates strong correlations between items and their relevant domains and suggests a good item convergent validity. Additionally, there were weak correlations between items and non-relevant domains, which indicates a good discriminant validity. For example, the coe cients between domain of FWB and items within this domain (GF1-GF7) were higher than 0.70, higher than the correlation between the domain and any other items in other domains.        Table 5 in detail), with PWB ranging from 6 to 13, SWB and FWB from 6 to 10, EWB from 6 to 18,and HCS from 13 to 28. Generally, the G coe cients and Φ coe cients increased as the number of items in each domain increased. Under the current designs, the G and Φ coe cients were higher or close to 0.70 in four out of ve domains, except for EWB. In addition, Table 5 showed the effects of the various levels of items (from 6 to 22) on reliability with G ranging from 0.517 to 0.888, and Ф ranging from 0.335 to 0.883. Responsiveness 68 patients completed the questionnaires with regard to evaluation of responsiveness at the third assessment. As shown in Table 6, the scores of SRM regarding PWB and FWB were 0.69 and 0.40 (p < 0.05) indicate the statistically signi cant changes after treatments. In addition, the score changes in the general module, the overall scale and Trial Outcome Index were statistically signi cant with SRM being 0.56, 0.46 and 0.40, and ES being 0.50, 0.47 and 0.30 respectively. Almost all item-domain correlation coe cients met the standards of item convergent validity and discriminant validity. Overall the correlations between relevant domains of FACT-Hep and QLICP-LI were higher than those between non-relevant domains. These correlations supported the criterion-related validity and also demonstrated the domains' convergent validity and divergent validity.

Construct Validity
In addition, we have applied the traditional classical test theory analysis as well as the Generalizability Theory to the present study. Both G-coe cients and Ф-coe cients were presented. G-and Ф-coe cients changes as the items are assumed to change. It can be seen from Table 5 that G-coe cients and index of dependability were all greater than or close to 0.70 for the current design, and changed a little as items changes for four out of ve domains (exception of EWB). Therefore, current items are considered to be reasonable and acceptable for these domains. Regarding the EWB domain, we estimated a G-coe cient of 0.517 and an index of dependability of 0.335 for the current design, which was below the acceptable level of 0.70. Hence, the domain's items need to be improved. Regarding an alternative design with 13 items, the G-coe cient estimated to be 0.699. Therefore, it will be better to increase the numbers of items of EWB in order to achieve an acceptable dependability. To sum up, the analysis from Generalizability Theory con rmed the reliability of the scale further. However, the numbers of items for EWB domain should be increased in order to obtain better reliability if possible.
In terms of responsiveness, we assessed the score changes between the pre-treatment and posttreatment assessments by the classical paired t-test as well important indicators, SRM and ES. Our study shows moderate and large responsiveness regarding PWB, FWB, the general module and overall scale, which supports for the good responsiveness of Chinese version of FACT-Hep. Additionally, we found some domains not statistically signi cant in our study, which may be explained by these two reasons: (1) the observation period (about four weeks) might be too short to observe signi cant changes; (2) the score in these domains are of no change in nature. Regarding HCC domain, another possible reason is that some patients would become better after treatments and some patients become worse, and thus no change can be found when we pooled all patients scores together.
Though our study showed the Chinese version of FACT-Hep is a reliable and valid instrument, some limitations should be mentioned. First, nearly half of the participants did not complete the third assessment because they were not in the wards when the investigators attempted to interview with them at the appointed times, because of a variety of reasons (e.g., going to other departments for treatments, being discharged early for nancial reasons). This may have some in uence on the responsiveness evaluation. If these events happened randomly, it might be slight. Second, the participants were selected only from inpatients admitted to hospital. We recommended testing the psychometric properties of the instrument in other populations, such as outpatients at clinic visits.

Conclusions
Our study shows that the Chinese version of FACT-Hep has good validity, reliability, and responsiveness. It can be used to measure QOL for patients with hepatocellular carcinoma in Mainland China. However, the responsiveness needs to be tested in other settings such as outpatients at clinics, and in larger sample.

Declarations
Ethics approval and consent to participate The study protocol was approved by the Institutional Review Board (IRB) of the investigators' institutions and the hospitals (IRB of the rst A liated Hospital of Guangdong Medical University, PJ2012052). The respondents were voluntary and provided consent for participation.

Consent to publish
The authors understand and agree to publish.

Availability of data and materials
The data can be available by request.