In accordance with the CRUK/EORTC imaging biomarker consensus statement, this was a two-centre Domain 2 validation study evaluating performance characteristics, reproducibility and whether the biomarker is ‘fit for purpose’ (15).
Patients and treatment
We performed a case-control study at two UK centres, the Christie NHS Foundation Trust, Manchester, and Leeds Teaching Hospitals Trust, Leeds (LTHT). Patients were included if they had histologically confirmed SCCA; T1 to T4 disease (AJCC 7th edition) (16); and had received CRT with curative intent for non-metastatic disease. For the control group, patients were free of LRF for at least three years follow-up. Patients with histologies other than SCC were excluded, as were patients where T-stage was undeterminable (Tx disease,) as mrTV and mrT-size parameters could not be quantified.
All patients were treated between 2007 and 2014 prior to the introduction of IMRT (Intensity Modulated Radiotherapy). The treatment protocol followed that used in the ACT II trial (17) – namely, radiotherapy of 50·4 Gy was delivered over 5·5 weeks with a two phase technique, without a mandatory break. Phase 1 included 30·6Gy in 17 daily fractions with non-conformal rectangular parallel-opposed fields. Phase 2 required conformal planning and delivered 19·8Gy in 11 daily fractions over 15 days to the primary tumour with a 3 cm margin and any involved lymph nodes. Chemotherapy regimens were administered concurrently with radiotherapy as either: mitomycin-C (MMC) 12mg/m2 on day 1, and continuous infusion of 5-fluorouracil (5-FU) 1000mg/m2 on days 1-4 and days 29-32.
Selection for case-control study
From the retrospective two-centre clinical databases, all 40 patients with LRF from 2007 to 2014 undergoing CRT and with measureable anal tumours were selected as cases (one outlier volume later excluded). Forty-one patients without LRF at 3 years were controls. Control selection was at random from those patients in the databases with available MR images satisfying criteria listed next.
Tumour volume quantification
All tumour quantification used routinely collected pre-treatment MR imaging, performed on a 1.5 Telsa MR employing optimal pelvic phase-array body coil (acquisition protocols are detailed in the supplementary material, Table S1). For inclusion, scans had to meet the following criteria: (i) include a small field of view high resolution T2-weighted (T2W) sequence in the axial plane as minimum, with a slice thickness ≤4 mm; (ii) field of view extending above and below the tumour in two orthogonal planes to allow complete tumour assessment including TV quantification and assessment of T-size; and (iii) where the tumour required more than one series to assess the whole tumour, then these series had to overlap sufficiently such that the entire tumour was imaged and could be quantifiable.
To determine mrT-size, three primary orthogonal measurements of the maximal diameters were taken in the anterior-posterior (AP), left-right (LR) and cranio-caudal (CC) planes along the axis of the tumour measured on the high resolution T2W images. The AP and LR diameters were recorded on the axial plane at the point of maximal dimension. The longest diameter was noted and the next dimension was taken at an axis perpendicular to the above. CC dimension was measured in either the coronal or sagittal plane. The largest of these three diameters was considered to be the tumour size (cm). Assessors (RK and BC) were blind to LRF status.
mrTV measurements were performed using World-Match (in-house written software from MvH (18)) that allowed simultaneous contouring on several sequences of different planes. Pre-treatment MR images were imported in anonymised DICOM format and the primary tumour was manually contoured. Delineations were checked and adjusted accordingly using coronal and sagittal planes. All contiguous areas of tumour were contoured together including nodal masses that had coalesced with the tumour or contiguous areas of extra-mural vascular invasion (EMVI). This was required to account for difficulties in defining a plane between the entities and to allow consistency of approach. Separate or discrete nodal volumes were not included. The assessor (HS) was blind to LRF status.
For the main analysis, mrTV was derived by using a summation of areas method (Volsum), which sums-up the area contoured on serial image slices while taking into account the distance between the slices; i.e. the slice thickness of the scans (Figure 1A & 1B). The time taken to contour the TV on MR images, was recorded for the first 22 patients (‘training’) and compared with the remaining patients.
Intra-and inter-observer variability
Intra- and inter-observer variability of mrTV quantification was assessed in ten (randomly selected) patients and compared using intra-class concordance correlations (ICC) with scores <0.5 representing poor; 0.5 to <0.75 moderate; 0.75 to <0.9 good; and ≥0.9 excellent agreement (19). 95% confidence intervals were derived from z-transformations. Bland-Altman plots were employed to further assess agreement (20). We standardised all plots so that y-axis (mean difference between measure modalities) ranges were equivalent to those of the x-axis (mean value from both measure modalities). We then examined each plot for: (i) mean values closeness to zero; (ii) levels of agreement; and (iii) that the pattern across the range of means was proportionate i.e. evaluating for trends across the range. For levels of agreement, we expressed as percentage of the x-axis range of values and reported as these as ‘wide’ if the limits fell outside +/- 10% of the average mean difference.
Ellipsoid and Elliptical Cylinder Volume estimation
We tested whether volume estimation derived from 1D tumour dimensions is a valid estimate of TV, using ellipsoid and elliptical cylinder equations and measured 1D diameters (Figure 1C & 1D), where d1, d2 and d3 are the maximal diameters of the primary tumour measured in the AP, LR and CC planes. The ellipsoid equation was: ; the elliptical cylinder equation was: These were assessed for reproducibility as above. We also evaluated for accuracies compared with Volsum using the ROCAUC method (we expected ellipsoid and elliptical to be equivalent as they are derived from the same parameters).
Statistical Analysis
Stata software, version 14 (Stata Corp., Tx, USA) was used for all statistical analyses. Continuous data were summarised as medians with inter-quartile ranges (IQR) and categorical data were presented as proportions. Comparisons were by Chi-square and Fisher’s Exact tests and non-parametric Mann-Whitney U tests, respectively.
Assessment of the discriminatory potential of the different tumour quantifications was tested with receiver operator characteristic curves (ROCs) and estimation of accuracy using the AUC and compared against each other using the method of DeLong et al.(21). Multivariable models used logistic regression, and derived ROCAUC using post-estimation commands.
In our power calculation, we posited that 10% would be a meaningful clinical difference for areas under the curve ROCAUC. We added 3% as case-control studies tend to overestimate performance characteristics (22). Thus, we concluded that 42 cases of LRF and 42 controls of non-LRF patients would be required to reach ROCAUC difference of 13% at α < 0.001.
We utilised other indicators of performance characteristics, using the methods described by Pencina et al. (23) which derives two characteristics – the Integrated Discriminatory Improvement (IDI), an index of improvements in sensitivity relative to specificity, and Net Reclassification Improvement (NRI), an index of net change in events versus non-events detected, which in turn focuses on medical decision making.