Health-related quality of life in women with breast cancer: a review of measures

Background To identify and describe the breast cancer–specific health-related quality of life (HRQoL) instruments with evidence of validation in the breast cancer population for potential use in patients treated for breast cancer (excluding surgery). Methods We conducted a systematic literature review using PubMed, Embase, and PsycINFO databases to identify articles that contain psychometric properties of HRQoL instruments used in patients with breast cancer. Relevant literature from January 1, 2009, to August 19, 2019, was searched. Articles published in English that reported psychometric properties (reliability, validity) of HRQoL instruments were identified. Results The database search yielded 613 unique records; 131 full-text articles were reviewed; 80 articles presented psychometric data for instruments used in breast cancer (including generic measures). This article reviews the 33 full articles describing psychometric properties of breast cancer-specific HRQoL instruments: EORTC QLQ-C30, EORTC QLQ-BR23, FACT-B, FBSI, NFBSI-16, YW-BCI36, BCSS, QuEST-Br, QLICP-BR, INA-BCHRQoL, and two newly developed unnamed measures, one by Deshpande and colleagues (for use in India) and one by Vanlemmens and colleagues (for use among young women and their partners). The articles that described the EORTC QLQ-C30, QLQ-BR23, and FACT-B centered on validating translations, providing additional support for content validity, and demonstrating acceptability of electronic patient-reported outcome administration. Psychometric properties of the measures were acceptable. Several new measures have been developed in Asia with an emphasis on development on cultural relevance/sensitivity. Others focused on specific populations (i.e., young women with breast cancer). Conclusions Historically, there have been limited options for validated measures to assess HRQoL of patients with breast cancer. A number of new measures have been developed and validated, offering promising options for assessing HRQoL in this patient population. This review supports the reliability and validity of the EORTC QLQ-C30 and FACT-B; new translations and electronic versions of these measures further support their use for this population.


Background
Patient-reported outcomes (PROs) are defined as a "measurement of any aspect of a patient's health status that comes directly from the patient (i.e., without the interpretation of the patient's responses by a physician or anyone else)" [1]. Patient-reported outcome measures (PROMs) provide an opportunity for patients to indicate the impact of a disease and its treatment on their lives. Health-related quality of life (HRQoL) represents a patient's physical, psychological, and social response to disease and therapy and is one type of PRO [2]. PROs can provide additional information to help with treatment approval, reimbursement, and selection/dosing decisions; management of medication side effects; health monitoring; and patient-provider decision-making.

Open Access
*Correspondence: msalas@dsi.com 1 Epidemiology, Clinical Safety and Pharmacovigilance, Daiichi Sankyo, Inc., 211 Mount Airy Road, 1A-453, Basking Ridge, NJ 07920, USA Full list of author information is available at the end of the article Breast cancer is the most commonly occurring cancer in women, with an estimated 2 million new cancer cases diagnosed globally in 2018 [3]. Advanced breast cancer has been described as a generally incurable, yet treatable disease, and the primary goals of treatment are to reduce symptom burden, maintain quality of life (QoL), and prolong survival [4,5]. Treatment for patients with advanced disease includes neoadjuvant chemotherapy, surgery, postsurgical radiation therapy, and systemic adjuvant therapy (including hormone therapy for those with hormone receptor-positive breast cancers). Understanding the impact of treatment on patients' HRQoL outside clinical trials can provide useful information for patients and clinicians in making treatment decisions.
Several PROMs have been used to assess patients with breast cancer. PROMs are questionnaires that capture patients' feelings and functioning in a structured manner and consist of items and corresponding response options; When developed and validated according to international guidelines, PROMs can provide reliable and valid patient assessment.
A systematic literature review by Nguyen et al. [6] indicated that the European Organization for Research and Treatment of Cancer (EORTC) Breast Cancer-Specific Quality of Life Questionnaire-23 item (QLQ-BR23) and the Functional Assessment of Cancer Therapy-Breast (FACT-B) are the only HRQoL questionnaires that have been developed specifically for patients with breast cancer facing different disease stages and treatments. Both tools act as supplements to their general cancer questionnaires, the EORTC Quality of Life Questionnaire, Version 3.0 (QLQ-C30) and the FACT-G, respectively. Given recent developments in breast cancer treatment, we sought to determine whether additional valid and reliable HRQoL measures are available in the public domain. Specifically, we aimed to identify breast cancer-specific HRQoL measures with evidence of validation in the breast cancer population for potential use in patients underwent systemic treatment for breast cancer (excluding surgery and radiotherapy).
As HRQoL measures are focused on patients' overall health and well-being, regional characteristics and traditions may be included. This review focused on identifying HRQoL measures regardless of whether or not they were region specific.

Literature search
The literature review was conducted on August 19, 2019, in the PubMed, Embase, and PsycINFO databases. Table 1 presents the search strategy used for PubMed; the key words used were translated for each of the individual databases. The search focused on the past 10 years (January 1, 2009-August 19, 2019), was limited to publications written in English, and excluded commentaries, letters to editors, editorials, book chapters and case reports because they did not contain detailed information on the psychometric properties of the instruments.

Literature review
Unique records that were identified across the three databases were reviewed in accordance with prespecified inclusion criteria. Studies were required to include patients (aged ≥18 years) with breast cancer who were treated with a pharmaceutical intervention and to assess a psychometric property of an HRQoL-focused PROM. Psychometric properties of interest included reliability (internal consistency, Cronbach alpha, test-retest), validity (content, convergent, divergent), and responsiveness. Psychometric properties for different modes of administration (e.g., electronic PRO [ePRO]) or for translations were included. Reasons for exclusion were populations receiving surgery or radiation, studies focused on HRQoL of treatment efficacy only, and studies only considering caregiver burden. References of relevant review articles were reviewed for any pertinent articles not identified in the original search. Two investigators reviewed the abstracts and selected abstracts that fulfilled the inclusion criteria. Any disagreement among investigators was discussed and final decision was done based on consensus.
During level 1 screening (titles and abstracts), studies that did not meet criteria were excluded. Full texts of included studies were reviewed (level 2 screening) using the same relevance criteria applied at level 1. Upon completion of level 2, an additional criterion was added to focus the review on breast cancer-specific HRQoL instruments only.
Information regarding reliability (internal consistency, Cronbach alpha, test-retest) and validity (content, convergent, divergent) were extracted from the studies. These psychometric properties were analyzed in accordance with prespecified thresholds of significance (e.g., Cronbach alpha > 0.7). In addition, item content of the instruments was reviewed. Figure 1 summarizes the literature review, which identified 613 unique records for level 1 screening, of which 131 full-text articles were reviewed; 80 articles presented psychometric properties for identified PROMs used in breast cancer. This review focuses on the 33 that described psychometric properties of breast cancer-specific HRQoL instruments.  [7][8][9]. For each identified PROM, Table 2 provides an overview of the measure's purpose, the domains assessed and the number of items. Table 3 provides an overview of concepts addressed for each PROM. Table 4 provides an overview of the psychometric articles (instrument, objective, population), and Table 5 provides psychometric qualities of the identified instruments.

Comparison of item content
A review of the item content of the identified measures reveals variability in the content that each assesses ( Table 3). Most of the PROMs assess not only breast cancer symptoms, but the physical and emotional aspects of the disease. Breast cancer-specific symptoms appear not to be included in INA-BCHRQoL, YW-BCI36, and the new measures [7][8][9]. Physical and emotional or social functioning are included in the QLQ-C30/QLQ-BR23, FACT-B, FBSI, NFBSI-16, QuEST-Br, QLICP-BR, whereas other concepts are addressed within only one or two PROMs. For example, sexual function is included in the QLQ-BR23, but no other PROM; body image is only included in the QLQ-BR23 and the YW-BCI36. Vanlemmens and colleagues' measure for young women (< 45 years of age) is focused not only on the patient with breast cancer but also on her partner and their relationship (i.e., couple cohesion, managing children/everyday life) [8]. Regarding the financial impact of living with breast cancer, only the measure by Vanlemmens et al. [8] and the YW-BC136 include this concept. Approximately two thirds of the identified articles (22/33) focused on the EORTC QLQ-C30, EORTC QLQ-BR23, or the FACT-B.
Internal consistency was assessed using the Cronbach alpha coefficient for translations. The breast symptoms scale for several translations (Chinese [23], Arabic [14], and Mexican-Spanish [13]) was below 0.70; otherwise reliability of the QLQ-BR23 translations met established criteria (Table 5). Test-retest reliability was also established for the Arabic version [10]. Various methods were used to evaluate the validity of the translations, including multitrait scaling and known-groups comparisons. Itemconvergent validity was demonstrated (i.e., exceeding the 0.40 criterion) [14,17,23]. The questionnaires differentiated patients with lymphedema from those without [29], differentiated patients with early stage breast cancer versus those with locally advanced breast cancer [11,13], and were responsive to changes following treatment [13]. Additional content validity for signs and symptoms was evaluated by testing the correlation between reported adverse events and responses to the QLQ-C30 [18].
Bjelic-Radisic et al. [12] evaluated whether updates in breast cancer treatment necessitate updating the EORTC QLQ-BR23, which was developed in 1996. A literature review and interviews with patients and health care providers suggest that additional concepts were missing. The new items contain two multi-item scales: target symptom scale (20 items) and satisfaction scale (2 items). The target symptom scale can be further divided into three subscales: endocrine therapy scale, endocrine sexual scale, and skin/mucosa scale. Further psychometric validation is underway.         Administration of the measures via ePRO also has demonstrated reliability and validity [15,21].

FACT-B
The majority of the FACT-B publications presented reliability and validity data for translations of the measure into Arabic [24], Persian [31], Czech [26], Lebanese Arabic [27], and Chinese [30] (Table 4). One publication presented data regarding the appropriateness of an ePRO application [29]. Two articles compared the properties of the FACT-B (disease-specific measure) with that of a general HRQoL measure, the EQ-5D [25,28].

Other measures FBSI and NFBSI-16
The Chinese translation of the FBSI has demonstrated adequate test-retest reliability as well as known-group validity and convergent and divergent validity [32]. Garcia et al. [33] sought to develop a new version of the FBSI in accordance with US Food and Drug Administration guidance for PRO measures that provides assessment on a symptom level and improves upon the original FBSI by emphasizing input from patients. Specifically, 52 patients with breast cancer provided their top-priority symptoms/ concerns through open-ended interviews and symptom checklists. After patient input was reviewed, eight additional items were added to the original FBSI, creating the NFBSI-16. Conceptual relevance was supported for most items in the NFBSI-16 based on patients' reports of experiencing the concepts as part of their breast cancer experience [34].

YW-BCI36
Christophe et al. [35] developed a questionnaire specifically measuring the subjective experience of nonmetastatic breast cancer in young women (aged 45 years or younger when diagnosed), their perceptions regarding its treatment in their daily life, and the repercussions of the disease. Reliability and validity of the new measure were demonstrated (Table 5).

BCSS
Horigan et al. [36] conducted a large survey of registered patients with breast cancer to further document the content validity of the BCSS. Specifically, the patients were asked to rank 21 issues identified as important to them.
The nine highest ranked items include good QoL, maintaining independence, able to sleep, able to concentrate, perform normal activities, being fatigued, having depression, being anxious, and having pain. The five lowest ranked items include appetite, breast-specific issues, hot flashes, and sexuality. Ratings by breast cancer subset (newly diagnosed, on treatment, no evidence of disease, hormonal or nonhormonal treatment, metastatic disease, survivors) showed some differences compared with those by the whole group.

QuEST-Br
Harley et al. [38] adapted existing HRQoL instruments (EORTC measures) for use in routine clinical practice delivering outpatient chemotherapy for breast cancer. Methods followed the guidelines laid out by the EORTC Quality-of-Life Group for developing questionnaire modules [40]. Internal consistency reliability was > 0.70 for the QuEST-Br scale [38].

QLICP-BR
Wan et al. [37] developed and validated a QoL instrument for patients with breast cancer in China. The measure was developed with particular attention to Chinese culture. For example, the family relationship and kinship play very important roles in daily life. Taoism and traditional medicine focus on good temper and high spirit. Good appetite, sleep, and energy are highly regarded in daily life, and food culture is very important [37]. The QLICP-BR was found to have adequate reliability and validity (Table 5).

INA-BCHRQoL
Saptaningsih et al. [39] developed a new measure to capture not only the physical, cognitive, and psychological aspects of patients but also the spiritual aspect. The questionnaire was developed in Indonesia and was designed to be culturally relevant (i.e., it included a spiritual domain, which is suitable for Indonesia, as it is a very religious country) to the breast cancer population in Indonesia. The INA-BCHRQoL was found to have adequate reliability and validity (Table 5).

Unnamed measures
Deshpande et al. [7] developed and validated a patientreported questionnaire to assess the QoL outcomes of patients with breast cancer in India. Reliability and content validity were demonstrated (Table 5).
Vanlemmens et al. [8,9] developed and validated a particular and specific inventory for measuring the impact of breast cancer on the QoL of young women (< 45 years of age) with nonmetastatic disease and the QoL of their partners. Reliability (internal consistency and test-retest)

EORTC QLQ-BR23
Alawadhi et al. [10] ▪ The intraclass correlation for the test-retest statistic and the internal consistency values for the multi-item scales was > 0.7 ▪ With the exception of the pain subscale, all items met the item internal consistency criterion of > 0.4 correlation with the corresponding scale. ▪ The QLQ-BR23 performed better than the QLQ-C30 for item discriminant validity ▪ The scale scores discriminated between patients at different disease stages and between sick and well populations.
Bener et al. [11] ▪ 6 of the 9 subscales met the standards of reliability, with coefficients ranging from 0.55 to 0.89 ▪ Advanced breast cancer stages of III-IV had significantly higher symptomatic scores than those in early stages for the physical function, cognitive, fatigue, insomnia, appetite loss, constipation, and financial difficulties.  results for item discriminant validity were satisfactory, with the exception of item 5, which showed higher correlation with other subscales than with its own physical functioning.
▪ The Spearman interscale coefficients generally were correlated with each other. Results of known-group comparisons did not show significant differences in terms of disease stage. Regarding education level, patients with high school/university education had better functional scale scores only in certain subscales compared with other subgroups; furthermore, patients with secondary school education had better GH/QoL compared with other subgroups of patients.
Simons [18] ▪ NR Tan et al. [20] ▪ Cronbach alpha coefficient results for EORTC QLQ-C30 and QLQ-BR23 were 0.846 and 0.873, respectively ▪ The correlation between EORTC QLQ-C30 and EQ-5D QoL instruments demonstrated a modest linear relationship (r = 0.597; P < 0.001) that indicated a moderately strong correlation between the two measures Wallwiener et al. [21] ▪ No differences in terms of acceptance between paper and electronic patientreported outcome ▪ No significant different in response behavior between paper and electronic patient-reported outcome ▪ NR Wallwiener et al. [22] ▪ High correlations were shown for both dimensions of reliability (parallel forms reliability and internal consistency) in the patient's response behavior between paper-and electronic-based questionnaires  ▪ NR ▪ In a cross-sectional setting, the differences in the effect size favored EQ-5D-5L and the 90% CIs totally fell within the zone that indicated the noninferiority of the EQ-5D-5L (e.g., oncologist-assessed performance status: − 0.26 to 0.04; patientassessed performance status: − 0.48 to − 0.16; current evidence of disease: − 0.28 to 0.08). In a longitudinal setting, the FACT-B showed larger effect sizes and ICCs than the EQ-5D-5L. The 90% CIs, however, overlapped the noninferiority margin, thus noninferiority in these two aspects could not be confirmed Jarkovsky et al. [26] ▪ Similar to other validations of FACT-B translations; good reliability, sensitivity, and reliable internal structure after translation ▪ NR Kobeissi et al. [27] ▪ NR ▪ The following questions were perceived to be most important: ability to meet the needs of my family, pain, emotional support, worry that my condition will get worse, sleep, worry that other family members will get the disease, change in weight, and pain in different areas of the body.
▪ Instrument was perceived to be adequate, appropriate for use, culturally sensitive, simple, and exhaustive.
Lee et al. [28] ▪ For test-retest reliability, the confidence intervals of the differences in ICC overlapped the noninferiority margin ▪ Using performance status, evidence of disease, and treatment status as criteria, the differences (FACT-B minus EQ-5D-3L) in the effect size for discriminative ability were negative or close to, 0 and the 90% confidence intervals (CIs) fell within the zone that indicated noninferiority of EQ-5D-5L ▪ For responsiveness, the CIs of the differences in effect size overlapped the noninferiority margin (difference in effect size (90% CI), FACT-B vs. EQ-5D-5L [0.04 Matthies et al. [29] ▪ High correlations were shown for both dimensions of reliability (parallel forms reliability and internal consistency) in the patients' response behavior between paper-based and electronically based questionnaires; regarding the reliability test of parallel forms, no significant differences were found in 35 of 37 single items, while significant correlations in the test for consistency were found in all 37 single items, in all 5 sum individual item subscale scores, and in total FACT-B score ▪ NR

INA-BCHRQoL
Saptaningsih et al. [39] ▪ Cronbach alpha for physical, cognitive, social, and spiritual domain were higher than 0.8, and the corrected item-total correlation was also higher than 0.3 ▪ Each domain of the questionnaire was not influenced by the treatment options. ▪ 24 patients with early stage breast cancer (10 FAC based chemotherapy and 14 taxan-based chemotherapy) were enrolled in the main study, and the score of HRQoL obtained from INA-BCHRQoL was considerably high.

Unnamed
Deshpande et al. [7] ▪ Cronbach alpha value for the questionnaire was 0.93 ▪ Patients understood the questionnaire and found the items to be relevant, indicating content validity.
▪ The statistical assessment of the scores did not show the association between scores with age or stage of breast cancer, as sample size was small.
Vanlemmens et al., [9] ▪ Participants reported on 8 dimensions of their quality of life during treatment and follow-up: psychological, physical, family, social, couple, sexuality, domestic, professional, economic ▪ Very few differences were found between the 4 groups (chemotherapy,  Table 5).

Discussion
Understanding the effect of breast cancer treatment on a patient's HRQoL is a central clinical and research question. However, to accurately assess HRQoL, valid and reliable PROMs are needed: that is, PROMs with evidence of reliability, validity, and responsiveness in the population of interest (breast cancer). This review sought to identify disease-specific HRQoL measures with evidence of validation in the breast cancer population for potential use in patients underwent systemic treatment for breast cancer (excluding surgery and radiotherapy). In addition to the EORTC QLQ-C30, QLQ-BR23, and FACT-B, we identified an additional nine potential measures. The identified PROMs vary in the content that they assess. For example, Vanlemmens and colleagues' measure for young women (< 45 years of age) is focused not only on the patient with breast cancer but also on her partner and their relationship (i.e., couple cohesion, managing children/everyday life) [9]. This measure also assesses impact of breast cancer on the woman's career and finances. Other than the YW-BCI36, none of the other instruments assess the impact of breast cancer on a woman's career or finances. Conversely, most do assess not only breast symptoms but also the physical and emotional/psychological impact of disease.
The YW-BCI36 [35] and the measure developed by Vanlemmens et al. [8,9] were both developed specifically for women < 45 years old with breast cancer. Younger women with breast cancer have concerns (i.e., childcare, financial) that older women with breast cancer may not, thus these measures were developed specifically for this population. Several other PROMs were developed to meet a specific unmet need within regions (China, Indonesia, India) for measures that were culturally appropriate (QLICP-BC [37], INA-BCHRQoL [39], Indian breast cancer measure [7]). Given that these PROMs have been developed to be culturally relevant for a specific region/ population, they may not be appropriate for global studies.
Psychometric qualities that may be examined in the evaluation of an instrument may include acceptability, validity, reliability (including internal consistency and test-retest reliability), and responsiveness. When questionnaire responsiveness (the ability of a scale to detect significant change over time, assessed by comparing scores before and after an intervention of known efficacy) was examined on the basis of various methods, including t tests, effect sizes, standardized response means, or responsiveness statistics, the information available was scarce.
The FACT-B and the QLQ-BR23 were designed for use in patients with breast cancer with a range of disease stages and undergoing different treatments. The EORTC QLQ-BR23 and the FACT-B are well developed instruments that have been extensively tested among patients with breast cancer. The FACT-B is shorter than the QLQ-BR23 and covers fewer symptoms and treatment-related side effects. This review has identified additional translations of the measures, providing further evidence of their validity. Internal consistency estimates of reliability were adequate for research purposes, although the internal consistency estimates were somewhat lower for the cognitive and breast symptoms scales. Further psychometric testing of the Breast Cancer-Specific Quality of Life Questionnaire-45 item (QLQ-BR45) may provide improved results. A recent publication [41] provides more detailed updates on the development of the QLQ-BR45. The QLQ-BR23 was one of the first disease-specific questionnaires developed in 1996 to assess QoL in patients with breast cancer. Given the effects of newer therapeutic options available since then, the developers believed it was evident that the original 23-item QLQ-BR23 may not be able to cover many important QoL issues and potential side effects of newer treatments. Therefore, the EORTC Quality of Life Group decided to update this module, eventually creating the QLQ-BR45. The development of the QLQ-BR45 involved a systematic literature review to identify relevant QOL issues for patients, interviews with patients and providers, and pretesting of a preliminary module in an international phase 3 study. Results of the literature review and discussions with patients and providers indicated that the original QLQ-BR23 inadequately covered concepts currently relevant to patients with breast cancer. Thus, new items were developed (added to the existing QLQ-BR23) and pretested in a multinational study, resulting in the QLQ-BR45, which is currently undergoing further psychometric testing.
Much of the additional psychometric data for the EORTC QLQ-C30, QLQ-BR23, and FACT-B are from new translations, further confirming the acceptability of these measures. Reliability of tablet or ePRO versions of the measures were also confirmed. New instruments were developed de novo in order to be considered more culturally relevant to patients in Asian countries. While these measures have demonstrated adequate internal consistency, test-retest and responsiveness data are lacking.

Conclusions
Even though, historically there have been limited options for validated measures to assess HRQoL in breast cancer patients, a number of new options for assessing HRQoL in breast cancer population have been developed and validated in recent years. This review supports the reliability and validity of the EORTC QLQ-C30 and FACT-B; new translations and electronic versions of these measures further support their use for this population. Researchers should ensure that their selected PROMs are suitable for their target patient population, anticipated line of therapy, and the expected side effects of the therapies involved.