Repeatability and learning effect in the 6MWT in preoperative cancer patients undergoing a prehabilitation program

The main objective was to assess repeatability and learning effect of the 6-min walk test (6MWT) in a cohort of preoperative cancer patients referred to a prehabilitation program. As a secondary objective, we aimed to identify determinants of improvement in the second test. Secondary analysis from a large prospective study on the implementation of a multimodal prehabilitation program in a real-life scenario. Eligible patients were assessed at baseline before starting the prehabilitation program. The 6MWT was conducted according to the American Thoracic Society (ATS)/European Respiratory Society (ERS) guidelines with two tests being performed under identical conditions separated by 30 min. The distance covered (in meters) and the physiological responses (heart rate, oxygen saturation, fatigue, and dyspnea) to each test were recorded and compared. A total of 170 patients (60.9%) were analyzed. Repeatability of the distance covered with the 6MWT was excellent (ICC = 0.98; 95% CI: 0.92–0.99), but a mean increase of + 19.5 m (95% CI: 15.6–23.5 m; p =  < .001) in the second test was found, showing a learning effect with limits of agreement between − 31.3 and 70.4 m. Coefficient of variation was 4%. No clinical factor was found to be associated with an improvement in the second test. The 6MWT showed excellent repeatability in preoperative cancer patients, but a significant learning effect is present. No associated factors with a clinically meaningful improvement in the second test were identified. In light of these findings, two attempts of the 6MWT should be encouraged in this population.


Introduction
The 6-min walk test (6MWT) is a very common measurement of physical performance in different clinical populations [1,2] especially in cardiorespiratory settings [3][4][5]. Although several alternatives of the test have been proposed including 12-, 6-, 3-, and 2-min variations, the 6-min version stands as the most used both in clinical practice and research studies since it has shown the best test-retest reliability and reproducibility [6]. Over the past decades, as a result of a noticeable growth in the field of oncology rehabilitation and exercise, the use of the 6MWT has increased in this population although studies on its test-retest reliability in this population are scarce [7]. In patients with cardiorespiratory diseases where the 6MWT has been most widely applied, the test-retest reliability was found to be excellent with intra-class correlation coefficients (ICC) ranging from 0.88 to 0.99 [8][9][10][11][12]. However, despite this high correlation, most studies have reported improvements in the distance covered during the second test, indicating the existence of a learning effect which has led to a recommendation of performing a retest [9,13]. In cancer patients, the level of agreement in test-retest of the 6MWT has also been assessed in a sample of 30 patients across the disease continuum showing an ICC of 0.93 (95% CI: 0.86-0.97) and a coefficient of variation of 3% [14]. In line with what has been observed in other clinical populations [15], the 6MWT also showed a learning effect, with 80% of patients walking further in the second test (+ 16.6 ± 29.9 m, 95% CI: + 5. 5, 27.8), which falls within the minimal clinical important difference (MCID) range of 14 to 30.5 m reported for several clinical populations [16].
The 6MWT is widely used to assess functional capacity before and after a prehabilitation program in the setting of cancer surgery [17] as well as to stratify patients according to surgical risk [18] given that functional capacity has been linked to postoperative outcomes in several surgical populations [19][20][21][22]. Therefore, accurate measurement of functional capacity is important at baseline assessment as it might indicate the need for further, more complex tests (such as a cardiopulmonary exercise test). Furthermore, accounting for a learning effect ensures that subsequent improvements in the text as a result of a prehabilitation program are not due to the lack of a practice test. However, to date no study has assessed the repeatability (also termed test-retest reliability in the literature) of the 6MWT in a large sample of cancer patients scheduled for tumor resection surgery. Because in the scientific literature the terms of test-retest reliability, reproducibility and repeatability are frequently used indistinctly to analyze measurement errors of a test over time, it is important to discern these terms and define their scope of application. As repeatability specifically refers to the level of agreement between two measurements undertaken in a short period of time over identical conditions [23], we will focus on this term for the purpose of this study.
Therefore, the primary aim of this study was twofold: (1) to assess the within-subject repeatability of the 6MWT in a cohort of early-diagnosed cancer patients awaiting major surgery and (2) to quantify the learning effect of the test in this population. As a secondary objective, we aimed to identify potential determinants of distance improvement in the second test.

Design
This study is a sub-analysis stemming from a previous large prospective longitudinal study on the implementation of a multimodal prehabilitation program in a real-world scenario (NCT02976064). Patients were informed on the purpose of the study by an anesthesiologist and signed informed consent prior to any testing. The study was approved by the local ethics committee (HCB/2016/0883).

Participants
Eligible participants were consecutive patients scheduled for tumor resection surgery at Hospital Clinic de Barcelona between 2018 and 2020 who were invited to participate in a multimodal prehabilitation program. Specific inclusion criteria were as follows: Patients undergoing major oncology surgery (digestive tract, gynecological, pancreatic, thoracic, urological). High risk for postoperative complications (i.e., American Society of Anesthesiologists (ASA) physical status classification III-IV and/or age ≥ 70 years old and/or Duke Activity Status Index (DASI) < 46 points) [24]. Severe deconditioning in patients undergoing highly aggressive surgeries (i.e., Clinical Frailty Scale (CFS) ≥ 4 and/or diagnosis of malnutrition according to the Global Leadership in Malnutrition (GLIM) criteria).
Exclusion criteria included patients with severe musculoskeletal, neurological, or cognitive impairments who were unable to complete the tests and questionnaires in the prehabilitation program and those scheduled for surgery within 3 weeks.

Procedures
Patients who met the inclusion criteria were appointed for a baseline assessment in our Prehabilitation Unit. Patients were assessed by four different healthcare professionals: an anesthesiologist, who assessed surgical risk (ASA score), comorbidities (Charlson Comorbidity Index), performance status (DASI), and frailty (CFS); a nutritionist, who performed an assessment of the nutritional profile including analysis of prealbumin and albumin levels and malnutrition according to GLIM criteria; a psychologist, who explored expectations regarding surgery including anxiety and depression status using the Hospital Anxiety and Depression Scale (HADS) as well as readiness to change; and a physiotherapist, who performed a battery of physical tests including handgrip strength, 6MWT, and the 30-s Sit-to-Stand (STS) test in addition to an assessment of physical activity levels using the Yale Physical Activity Survey (YPAS) [18].
The 6MWT was performed in accordance with the American Thoracic Society (ATS)/European Respiratory Society guidelines [4] by the same trained physiotherapist who walked behind the patient on two attempts separated by at least 30 min. In between tests, patients rested, completed baseline questionnaires, or attended consultations with the healthcare professionals involved in the prehabilitation program. Patients were encouraged to walk as fast as they could during 6 min on an inside 30-m-long flat corridor. Standardized encouragement was given each minute as recommended with a total of four highly trained physiotherapists participating in the data collection of the tests. Oxygen saturation (SpO 2 ), heart rate, dyspnea, and leg fatigue according to the modified Borg Scale [25] were recorded at the beginning and at the end of each test, as well as during the first 2 min of recovery. Oxygen saturation and heart rate were also continuously monitored and recorded at each lap using a portable pulse oximeter (Nonin® 3150 Wrist Oximeter). Oxygen desaturation was defined as a decrease in SpO 2 ≥ 4% from baseline.
Once the patient was evaluated, a 4-week personalized prehabilitation program including exercise training, nutritional and psychological support, and medical optimization was initiated. Before surgery, patients were re-assessed in terms of nutritional, psychological, and exercise training adherence with the 6MWT being used to monitor the functional capacity improvement.

Statistical analyses
A descriptive analysis of baseline relevant data was initially performed, including the responses of both first and second 6MWT. Normal distribution of the data was assessed using the Kolmogorov-Smirnov test. The intraclass correlation coefficient (ICC) and the coefficient of variation (CV) were used to assess replicability of both 6MWT, with 95% CI added to assess for systematic error between test and retest. A Bland-Altman plot was used to evaluate agreement between the first and second tests. Proportional bias was analyzed through a linear regression model. Coefficient of variation (CV) was calculated as the ratio between the standard deviation (SD) and the mean. Paired t tests were conducted to compare physiological variables of both 6MWT after checking for normality. Ordinal data was assessed with nonparametric tests (chisquare). Finally, independent t tests were used to compare patients' characteristics between those improving in the second test and those who did not. Bivariate comparisons were also performed to identify potential correlations between the variables studied and increasing ≥ 14 m in the second 6MWT, according to the lower limit for a minimal clinical important difference (MCID) found in several clinical populations [16]. Analyses were performed using SPSS version 21 (IBM Corporation, Chicago, IL). A p value < 0.05 was considered for statistical significance in all analyses.

Results
Baseline characteristics of the patients are summarized in Table 1. Between May 2018 and March 2020, 279 oncology patients were consecutively screened for prehabilitation. We were able to obtain full data on both 6MWT in 170 patients (60.9%). Seventeen patients (6%) were excluded due to logistic limitations to attend the prehabilitation program and were not evaluated; 72 patients (25.8%) did not complete a second 6MWT due to several reasons: lack of time or motivation (n = 47), excessive fatigue/severe dyspnea or too deconditioned to repeat the test (n = 7), patient had another 6MWT performed within the past 3 months (n = 5), musculoskeletal or cancer-related limitations causing pain during the test (n = 10) or other reasons (n = 3); and on 20 patients (7.2%) there was some missing data from the second test (incomplete heart rate and/or oxygen saturation monitoring due to pulsioximeter malfunctioning or artifacts during the test such as cold fingertips or missed information on final Borg dyspnea/fatigue). No differences were found in baseline characteristics (ASA score, type of surgery, mean age, etc.) between included and excluded participants (p > 0.05).
The ICC for the distance walked during the 6MWT showed excellent repeatability between both tests (ICC = 0.98; 95% CI: 0.92-0.99) and the coefficient of variation (CV) was 4%. Patients walked a mean distance of 456 ± 111 m during the first test and 475 ± 112 m during the second test, corresponding to a mean increase in the distance walked during the second test of 19.5 m (95% CI: 15.6-23.5 m; p = < 0.001) with limits of agreement between -32.3 and 70.38 m (Fig. 1). The improvement was similar between men and women as well as older and younger subjects (Table 2).
Physiological responses to both 6MWT are found in Table 2. We observed an increase in the maximum heart rate achieved during the second test, both in absolute values and in percentage of maximum predicted (70.6% vs. 72.5% respectively; p = 0.006). The ICC for the mean change in HR between both tests (maximum-baseline) was 0.71 (95% CI: 0.6-0.78; p < 0.0001). No differences were found in final dyspnea and leg fatigue (p > 0.05) between the first and the second attempt, with ICC ranging from 0.6 for dyspnea to 0.71 for leg fatigue. Oxygen saturation responses showed a desaturation (decrease from baseline ≥ 4%) in 12 patients (7.1%) during the first test and 9 (5.3%) during the second (p < 0.0001).
The vast majority of the patients (80%) walked farther on the second test and 58% of those increased the distance walked by 14 m or more. When comparing patients who showed an improvement of ≥ 14 in the second 6MWT vs. < 14 m of improvement, no differences were found for any of the assessed variables (p > 0.05) ( Table 3). In addition, no significant correlation was found in the univariate analysis between any of the clinical characteristics and improving ≥ 14 m in the second test.

Discussion
Results from this study found that the 6MWT appears to be highly repeatable in oncology patients scheduled for major surgery. Nonetheless, the high proportion of patients increasing the distance covered during the second 6MWT shows a learning effect in this population, as observed previously in other clinical settings [15]. Unfortunately, no determinator of improvement was found among the variables studied. These findings suggest that performing at least two attempts is recommended in preoperative cancer patients to accurately determine patients' functional capacity (with the highest distance being selected for clinical or other purposes).
Over the last years, increasing evidence has been generated on the potential of prehabilitation to optimize physical and psychological resilience to cope with stress generated by major oncology surgical procedures [26] and 6MWT distance is usually considered an outcome of prehabilitation programs [27]. Guidelines to perform the 6MWT were first published by the American Thoracic Society (ATS) over 20 years ago [28] and initially stated that a practice test was not necessary for most clinical scenarios as the mean change from the first to the second was considered to be low. However, several authors have since then recommended using a familiarization trial to establish a baseline 6MWT [9,29,30]  while some other studies have neglected the need for a second test [31,32]. In patients which chronic respiratory diseases as well as cardiac failure, studies have shown that there is a significant difference in the distance covered between the first and the second test, with improvements ranging from 24 to 29 m [15]. Indeed, a systematic review and meta-analysis   conducted in 2014 in COPD found a pooled mean increase of 26.1 m in the second test [8]. Furthermore, this learning effect seems to exist not only when the tests are performed on the same day but also when performed separated by several weeks or months [30,33]. In cancer patients, the study conducted by Schmidt et al. [14] also showed a mean increase in the second test of 16.6 ± 29.9 m (95% CI: 5.  [16]. This finding should be taking into consideration in clinical practice otherwise subsequent improvements in the test as a result of an intervention (such as a prehabilitation programs) might be misleading. However, as performing a second 6MWT is time-consuming (an additional 40 min including resting from the first test and performing the second) and sometimes it is not well accepted by the patient (severe deconditioning, musculoskeletal impairments, etc.), it would be of great interest to identify potential factors associated with presumably improving the distance walked during the second test. This would allow us to select those patients in which a second test might be necessary for reliable results. Unfortunately, we were not able to identify any determinator of significant improvement among the clinical and anthropometric variables assessed in this study. In addition, no differences in the magnitude of learning effect were found between men and women neither between older and younger patients, which suggest that the variability is due to within-subject factors (motivation, readiness, fatigue, etc.). As so, two attempts of the test should be encouraged in this population to accurately assess functional capacity. Nevertheless, in some scenarios, the clinical decision of performing a practice test or not could depend on the purpose of the assessment [34] as well as the baseline level of functional capacity (i.e., patients walking ≥ 500 m might not need a second test if the purpose is assessing the surgical risk as they would be already considered low-risk). Undoubtedly, more studies are warrant in this line to determine whether one test could be enough in these cases. Despite the existence of a learning effect reported in many clinical populations, the 6MWT has shown an excellent test-retest reliability and reproducibility across different populations including chronic cardiovascular and respiratory diseases [15,31,35] and neurological conditions [36,37] as well as healthy subjects [31]. However, when comparing studies, we observe that terminology used in these studies (reproducibility, replicability, and reliability) is sometimes used interchangeably when they actually refer to different test properties, which hinders the interpretation and comparison of the results across studies. While reproducibility requires measurements to be undertaken under different conditions (i.e., different assessors or different populations), repeatability is measured under identical circumstances over a short period of time [23]. Both constructs are usually assessed with the ICC to quantify reliability between the measurements and/or determination of agreement with a Bland-Altman plot [38]. In our study, as in previous studies with smaller samples in cancer patients [7,39], we found an ICC of 0.98 between two 6MWT conducted under identical conditions, showing excellent repeatability. In addition, the coefficient of variation was very low and similar to other previous studies [5,14]. Furthermore, no proportional bias was found showing that limits of agreement do not change significantly with different mean values of the test. So, although this is not the first study to evaluate the reliability and repeatability of the test in a cancer population, it is definitely the one with the highest and heterogeneous sample. In addition, the quality of our data was reinforced by the replicability of the physiological responses observed in both tests (ICC ranging from 0.6 to 0.83).
This study has some limitations that we must address. To start with, this is a sub-analysis derived from a large longitudinal study whose main objective was to analyze the feasibility and effectiveness of a multimodal prehabilitation program in a real-world scenario. Therefore, the sample size used might not be adequate for some of the analysis performed (i.e., identify potential factors associated with a meaningful change in the second 6MWT), although similar studies have been conducted where sample sizes are considerably smaller than ours [14,39]. In addition, a selection bias might have occurred as some of the patients were excluded from the final sample due to missing data on the 6MWT or refusal to perform a second test. However, in this case, after conducting an exploratory analysis between excluded and included patients, no relevant differences in clinical characteristics were found besides the excluded patients being more likely men (p < 0.05).
In summary, this study provides new evidence that the 6MWT is a highly replicable test with consistent agreement to assess functional capacity in preoperative cancer patients who are scheduled for major surgery, but with a significant learning effect. Therefore, our findings support the recommendation of a practice 6MWT at baseline assessment for most clinical scenarios. Future studies are needed to determine whether one test could be enough in some selected cases (for instance, highly functional patients achieving a significant distance in the first test or higher than 80% of predicted).