Optimal Time-point of Response Assessment for Predicting Survival Is Associated With Tumor Burden in Hepatocellular Carcinoma Receiving Repeated Transarterial Chemoembolization

Background Objective response rate (ORR) under mRECIST criteria after transarterial chemoembolization (TACE) has been identied as a surrogate endpoint of overall survival (OS). However, its optimal time-point remains controversial and may be inuenced by tumor burden. We aim to investigate the surrogacy of initial/best ORR in relation to tumor burden. Methods A total of 1549 eligible treatment-naïve patients with unresectable hepatocellular carcinoma (HCC), Child-Pugh score ≤ 7, and performance status score ≤ 1 undergoing TACE between January 2010 and May 2016 from 17 academic hospitals were retrospectively analyzed. Results Both initial and best ORRs interacted with tumor burden dened as our previously proposed “six-and-twelve” criteria. Both initial and best ORRs could equivalently predict and correlated with OS in low (adjusted HR: 2.55 and 2.95, respectively, both P<0.001; R=0.84, P=0.035 and R=0.97, P=0.002, respectively) and intermediate tumor burden strata (adjusted HR: 1.81 and 2.22, respectively, both P<0.001; R=0.74, P=0.023, and R=0.9, P=0.002, respectively). For high strata, only best ORR exhibited qualied prognostic value (adjusted HR: 2.61, P<0.001) with a satisfying correlation (R=0.70, P=0.035), whereas initial ORR was not statistically signicant (adjusted HR: 1.08 P=0.357; R=0.22, P=0.54).


Introduction
Transarterial chemoembolization (TACE) is the recommended rst-line non-curative therapy for unresectable hepatocellular carcinoma (HCC), and overall survival (OS) is the standard outcome measurement (1,2). However, setting OS as an exclusive endpoint would increase the di culty of clinical researches since it requires a large sample size and long follow-up duration to capture su cient events (3). Consequently, with early availability and close correlation to survival (4,5), objective response (OR) has been proposed as a robust surrogate endpoint (1,6), which is de ned by the sum of complete and partial response (CR and PR) under modi ed Response Evaluation Criteria in Solid Tumours (mRECIST) criteria (7)(8)(9)(10).
Nevertheless, the common practice of repeated "on-demand" TACE arouses a question on whether the surrogacy of OR should depend on response after the initial procedure (initial OR) or the best response achieved during repeated procedures (best OR) (11). Currently, initial and best responses are sometimes bracketed together to predict survival (4,10,12,13). However, inconsistent timing of response assessment across studies lead to variable OR rates (ORRs) and compromise its surrogacy, calling for determination of the optimal timing. Theoretically, initial response could exert surrogacy at the earliest time available, and has been supported by a recent study (14). However, in this study, subsequent responders without initial OR still exhibited better outcome than persistent non-responders, indicating initial response may leave out survival bene ts achieved in subsequent responders (9,14,15). On the other hand, several studies favoring best ORR (14,16) reveals controversies on this issue (Supplementary table 1) (11).
An underlying cause of these controversies is the in uence of tumor burden on TACE schedules. OR after initial TACE is more likely applied for small solitary nodule, obviating the necessity for subsequent procedure and evaluation; whereas large or multiple nodules usually require repeated TACE due to incomplete necrosis or viable tumors after initial procedure, and best ORR becomes more intuitively accurate (17). Hence it is reasonable to hypothesize that tumor burden, a particularly heterogeneous characteristic for TACE candidates, may in uence the optimal timing of ORR assessment to predict survival (11).
The current study aims to investigate the prognostic value of initial and best ORR and their surrogacy in relation to tumor burden using individual data.

Patient eligibility
Between January 2010 and May 2016, 3794 consecutive patients with unresectable HCC undergoing TACE from 17 Chinese tertiary centers were screened. HCC was diagnosed per AASLD and EASL guidelines (1,2). Inclusion criteria were: (1) treatment-naïve; (2) at least one nodule > 1 cm; (3) Eastern Cooperative Oncology Group performance status (ECOG PS) score 0-1; and (4) Child-Pugh score A5-B7. Exclusion criteria were: (1) vascular invasion or extrahepatic spread; (2) spontaneous tumor rupture; (3) other malignancies; (4) combined systematic or loco-regional therapy; (5) absence of baseline or follow-up imaging; and (6) decompensated portal hypertension. The diameter of the largest tumor (tumor size, Ts) and tumor number (Tn) were measured by two independent central reviewers (WB and DX) using dynamic contrast-enhanced computed tomography (CT) or magnetic resonance (MR) images, both of whom were blinded to patients' clinical data. Tumor burden was de ned as Ts + Tn, in accordance with our previous study, namely "six-and twelve" score, the sum of tumor size and number ≤ 6; the sum > 6 but ≤ 12; the sum > 12 for low/intermediate/high tumor burden, respectively(18). Treatment procedure TACE procedures, were performed by investigators with at least 8 years of experience. Speci cally, chemotherapeutic drugs used to mix with lipiodol (3-30ml) included doxorubicin (10-50 mg), cisplatin (10-110 mg), epirubicin (10-50 mg), or oxaliplatin (100-200 mg), and were selected according to the practice of each center. Gelatin sponge or polyvinyl alcohol foam particles were introduced for embolization, which was preferably super-selective. When necessary in treating large or multiple tumors, split TACE was allowed within 4-6 weeks of the rst TACE session, and sequential TACE would be conducted with the same agents upon following-up identi cation of intrahepatic viable tumors or intrahepatic recurrences by CT/MRI according to "on demand" schedule, if patients' performance status and laboratory ndings permitted. TACE failure/refractory was de ned as the appearance of vascular invasion or extrahepatic spread, deterioration of liver function with Child-Pugh grade C,consistent progressive disease (PD) after three sessions of TACE within 6 months (19,20). Following treatment would be subsequently recommended for patients with TACE failure/refractory, and administered upon patients' choice and consent.
Evaluation and de nitions of the treatment response Radiological response was routinely assessed at 4-6 weeks after rst TACE procedure using dynamic contrastenhanced CT or MRI to obtain initial response, and every 4-12 weeks thereafter (depending on each center's practice, patient characteristics and response) for subsequent assessment using consistent imaging modality. Initial response was de ned as the radiological response after the rst session of TACE, regardless of the schedule of cycles (7,8,14,21,22). Best response was de ned as the best radiological result across all time points during 'on-demand' repeated sessions (14)(15)(16). In cases where the same response was maintained through sessions of the procedure without improvement, the initial response would be regarded as the best response. All data acquisition of response assessment at individual patient, including initial and best response, was evaluated by two independent central reviewers (WB and DX) blinded to other clinical data to avoid bias, and nal consensus with another reviewer (EW) would be reached when the two reviewers had different judgements. Complete response (CR), partial response (PR), stable disease (SD) and progressive disease (PD) were de ned by mRECIST criteria (6,23). For analysis, the overall response (combined responses of target and non-target lesions) was adopted in our study. OR referred to sum of CR and PR, whereas non-response referred to sum of SD and PD. Patients also were divided into four groups according to occurrence and time of objective response as follows; (1) patients who initially achieved CR after the rst TACE (referred to ''initial CR''); (2) patients who subsequently achieved CR after at least two sessions (referred to ''subsequent CR''); (3) patients who achieved PR after multiple sessions of TACE (referred to ''subsequent PR'') and (4) patients who did not achieve objective response during the treatment course (referred to ''persistent non-response'.

Statistics
Missing values were imputed with 5 independent draws. Quantitative variables were compared with Student's t-test or Mann-Whitney U test, whereas categorical variables were compared by chi-squared test or Fisher's exact test. OS, de ned as the time interval between the initial TACE and all-cause death, was calculated with Kaplan-Meier method and compared by log-rank test. Patients who survived at last follow-up date (December 15th, 2017) or lost to follow-up were censored. Cox proportional regression was performed to identify the prognostic factors and hazard ratio (HR) with 95% con dence interval (95%CI). Notably, in Cox regression analyses. Evolutionary events were addressed as time-dependent covariates. Multivariate binary logistic regression was used to identify predictors of non-response after initial TACE. Interaction between signi cant predictors and initial/best ORR were tested. The relationship between the survival probability (in deciles) and odds of initial/best ORR ([p/1-p], where p is initial/best ORR) was evaluated using Pearson's correlation coe cient R and linear regression. Since ORR of 100% would result in odds values of positive in nity, corresponding survival decile groups would be integrated into the adjacent group until the integrated ORR became less than 100%, and overall odds of this integrated group was calculated accordingly (24), resulting in less than 10 decile groups in some analyses. Signi cance level was two-sided P < 0.05. Statistical analyses were conducted using SPSS version 22 (IBM, Somers, New York, USA) and R version 3.2.5.

Patient characteristics
A total of 1549 patients were nally included ( Supplementary Fig. 1). As is shown in Table 1, the median age was 57 (interquartile range, IQR: 48-65) years, and 1047 (67.6%) patients presented ECOG PS 0. The median T S was 6.1 (IQR: 3.9-9.6) cm, and 895 (57.8%) had a single nodule. Approximately 95% patients were graded Child-Pugh A.   Strati ed analyses of prognostication of ORR As predictors of ORR, both tumor burden and ECOG PS score interacted with initial/best ORRs in predicting OS (Supplementary table 4). Consequently, the prognostic value of ORR was further tested in subgroups where tumor burden (T S + T N ) was strati ed into low (≤ 6), intermediate (> 6 but ≤ 12), and high (> 12) according to previously proposed "six-and-twelve" score (18), and ECOG PS score was strati ed into 0 and 1 point.
In terms of strati ed survival, initial responders had signi cantly better survival than initial non-responders in low (median OS 47.  Fig. 1B, D, and F). Regarding PS score strata, survival probabilities signi cantly differ between responders and non-responders regardless of PS score and timing of assessment (all with log-rank P < 0.001, Supplementary Fig. 4). Such consistency between initial and best ORR across ECOG PS 0 and 1 reveals that PS was indiscriminate of timing of ORR assessment, further PS-related analyses were not performed. ORR by mRECIST as a potential surrogate endpoint

Discussion
The current study indicates that the surrogacy of initial and best ORR is associated with tumor burden: low/intermediate tumor burden does not discriminate the prognostic value of initial or best ORR, whereas high tumor burden compromises the sensitivity of initial ORR, thus best ORR appear to be superior. The strengths and novelty of this study lie in (1) nationwide multicenter study with the largest sample size focusing on the prognostic value of ORR after TACE in unresectable HCC with a wide range of tumor burden; (2) in-depth research of the interaction between tumor burden and initial/best ORR for the rst time; and (3) the rst study utilizing individual patient data to comprehensively analyse the surrogacy of ORR with a long follow-up period.
The median OS of 28 months in the current cohort was comparable to that of 30 months reported by EASL guideline (1). However, compared to our previous results of 33 months, this slightly inferior prognosis of current population might be caused by the inclusion of over thirty percent of patients with ECOG PS 1 score (18), in order to investigate our hypothesis on the timing of ORR as a surrogate within a more generalized population, rather than the optimal candidates of TACE as was studied in our previous research. Since a criteria for strati cation is needed in the current study to compare initial and best ORR in patients with different conditions, we endorsed the "six-and-twelve" score as the basis of strati cation due to its adequate performance in Chinese patient population compared to other models (18,25,26). Moreover, although this study utilized the major conclusions of our previous "six-and-twelve" study as a basis in principal analyses, the current research serves a more stretched purpose of individualized determination of timing of response, and providing a new answer towards the controversial question of the superiority of initial or best response, rather than prognosis evaluation at pre-TACE baseline without information on radiological response.
Quali ed surrogates of OS should be accurate and valid with su cient sensitivity to capture all net survival bene ts (27,28). However, previously reported ORRs vary from 52-84% (Supplementary table 14) (6). Apart from the heterogeneity in TACE technique and study population, the timing of ORR might also account for this discrepancy: in the current study, the nding that about every 1 out of 7 initial non-responders (15.0%) may become responders after repeated TACE (Supplementary table 2, 82 of 547 initial non-responders achieved response after repeated TACE). This is particularly relevant given the similar outcomes of patients with initial and subsequent CR, and the signi cant difference between patients with subsequent and persistent non-responders. It can be inferred that best response is indispensable in addition to initial response if all net bene ts are to be captured(3).
Nevertheless, since correlations between initial/best ORRs and survival were similar (R = 0.93, P < 0.001 vs. R = 0.82, P = 0.004), indicating that overall analyses could not determine the optimal timing of response, and patient strati cation is necessary. Upon identi cation between initial/best ORR and tumor burden, we performed strati ed analyses in low (≤ 6), intermediate (> 6 but ≤ 12), and high (> 12) tumor burden (18). Interestingly, the relative difference between initial and best response augmented considerably as tumor burden increases (3.0%, 4.9%, and 38.5%, respectively for initial/best ORR) ( Table 3). Correspondingly, correlation between best ORR and survival was similar as that of initial ORR in low and intermediate tumor burden strata (0.97 vs. 0.84 and 0.90 vs. 0.74), but became obviously better in high tumor burden stratum (0.70 vs. 0.22). This discrepancy may also explain why similar prognostic values of initial and best ORR was reported by a previous study: less than four nodules in 80% patients and the median tumor diameter of 3.1 cm indicated predominance of low/intermediate tumor burden. In contrast, 361 patients (23%) in the current study had high tumor burden, enabling the identi cation of discrepancy between initial and best ORRs (14).
The most probable explanation is the "ceiling effect" and "threshold effect": for patients with low/intermediate tumor burden, initial TACE alone might reach the "ceiling" of effectiveness with adequate tumor necrosis, which is supported by the trend that previous studies reporting an initial ORR of 52%-100% had a median or mean tumor diameter of 2.7 to 6.2 cm (Supplementary table 15); yet for patients with high tumor burden, one session is hardly su cient to step over the "threshold" with adequate necrosis, and a greater disparity between initial and best response rates can be expected after two or more sessions. This hypothesis is supported by a previous study where initial non-responders tended to have higher tumor burden and presented a subsequent response rate approaching 50% after second TACE (15). Similarly, in our previous study on TACE plus sorafenib where 44% patients had tumor diameter > 10cm, 57% patients had received two or more sessions at the optimal timing of response evaluation (9). Signi cantly, in those patients whose response status changed, the mean time to best response was 112 days, that means 1 or 2 sessions of procedures was conducted after initial treatment. If response status was not improved in this duration after repeated TACE, further alternatives may be considered.
Our ndings suggest the need of a trade-off in the timeliness and sensitivity of ORR at different time point. For patients with T S + T N ≤12, the prognostic values of initial and best ORRs were almost equivalent. Seeking subsequent OR in initial non-responders may be di cult, but would waste the early availability of initial evaluation, thus initial ORR should be preferred. For patients with T S + T N >12, only best ORR has quali ed surrogacy, and the timeliness of initial ORR should not overweight its inaccuracy. In this case, best response should be adopted.
Furthermore, it should be noted that radiological response per se is not only a surrogate endpoint, but also the basis of scheduling on-demand TACE. The current ndings indicate that if initial OR could not be achieved with TACE alone despite a low/intermediate tumor burden, an early decision of resorting to additional therapies or other treatments is worthy of consideration, since repeated TACE is unlikely to bring further improvement of ORR, furthermore, survival does not differ between non-responders to TACE and untreated patients, thus repeated TACE may not be recommended in cases where OR cannot be achieved by prior TACE (29). However, for patients with high tumor burden, initial non-response of TACE does not necessarily negate this rst-line non-curative treatment, since subsequent response still stands a chance with repeated TACE, and survival bene ts can be possible. Nonetheless, the substantially improved ORR after subsequent TACE still remains suboptimal, and OS is still unfavorable (18), thus alternative therapies, such as targeted therapy, immunotherapy, TACE combined with systematic therapy or other modalities may be options worthy of consideration for those patients to improve outcomes (30).
Of note, preservation of liver function is as important as achieving a high OR, because the goal of treatment is to prolong OS, initial or repeated TACE might deteriorate liver function, especially in high tumor burden. this might be one of reasons of poor prognosis of those patients. However, in the current study focusing on ORR as surrogacy of OS in patients receiving TACE (1), heterogeneity in compensated liver function (restricted to Child-Pugh A5-B7) and PS (restricted to ECOG PS 0-1) was less signi cant compared to the wide range of tumor burden. These might be the reasons why liver function was not quali ed predictive factors of OR in the current cohort (Supplementary table 3), and different performance status scores could not discriminate the surrogacy of initial and best OR in the current study ( Supplementary Fig. 4). Moreover, assessing response by mRECIST criteria were mainly based on tumor burden (6,23), and on-demand TACE is scheduled upon viable-tumors (which is very likely in uenced by tumor burden) but not upon liver function or PS -as long as they are well-reserved for tolerance. This difference may lead to a result that OR and its timing is more dependent of tumor burden, but less of liver function or PS.
There are some limitations in our study. Firstly, selection bias was unavoidable due to its retrospective nature. However, the inclusion of consecutive cases from multiple centers with a large sample size may minimize this risk. Secondly, the tumor characteristics and etiologies of Chinese patients might differ from those in Europe and America; however, the subgroup analysis suggested that our ndings were consistent in patients with etiologies other than HBV. Still, we admit that further investigations with large sample size of other etiologies in different regions to con rm our results are necessary. Thirdly, other unrevealed predictors that were not included or detected in our study, might be more signi cant and speci c, such as chemotherapy agents, genetic or other biological variables. Thus, the generalization of our ndings should be cautious. Finally, future trial-level data are needed to con rm these ndings (27,28).

Conclusions
In summary, optimal timing of ORR assessment should be tailored according to tumor burden. For patients with T S + T N ≤12, initial ORR is optimal for its timeliness upon similar sensitivity with best ORR.    HR for initial and best response in predicting OS within low tumour burden strata. Both initial and best response presented robust prognostic value across most subgroups. (B) HR for initial and best response in predicting OS within intermediate tumour burden strata. Both initial and best response presented robust prognostic value across most subgroups. (C) HR for initial and best response in predicting OS within high tumour burden strata. Initial response was not predictive of OS in any subgroup, whereas best response remained robust in most subgroups.