Automatic buccal tumor segmentation
Cone-beam computed tomography (CBCT), a form of CT imaging commonly used in dentistry and radiation therapy, is capable of quickly acquiring 3D images with high resolution in all three dimensions, which is desirable for accurate volume measurement. As a proof of concept for the use of our CBCT images for volume measurement, we scanned a collection of small aliquot tubes containing known volumes of water and used a python script to segment the image (Supplementary Figure 1A, B). Volumes calculated from the number of voxels in each region identified as water were in very good agreement with volumes expected based on weighing the water when is was pipetted into the tubes (Supplementary Figure 1C). In this case, image segmentation was achieved by thresholding (voxel values in water are higher than air or plastic), some processing of the binary image to correct consequences of noise and burring, and separation of the water density voxels into connected components. While similar methods are also applicable to mice, tumors are not readily distinguishable from other soft tissue in CT images, so a somewhat complicated approach was required to segment them (Figure 1A).
Using modified stages and anesthesia setups (Figure 1B, Supplementary Figure 1D), we were able to acquire CBCT scans of multiple mice at a rate comparable to caliper measurement (Supplementary Figure 1E), making this a viable alternative for tumor monitoring at scale. Pre-processing of the resulting scans was then required to separate individual mice (Figure 1C) and remove non-mouse objects (Figure 1D). The method we developed to segment each mouse exploits the symmetry of the head to identify tissue on one side that does not correspond to tissue on the other. This left-right mapping is achieved by fitting a curvilinear coordinate system that bends with the neck and jaw. In contrast to the natural coordinate system of voxel indices (Supplementary Figure 2A), where each voxel represents the same volume of space, the curved coordinates define a grid where the volume of real space represented by each point varies. To account for this, volumes in the curved space were calculated by adding up volumes of 24 tetrahedra per voxel, arranged to count all real space exactly once (Supplementary Figure 2B).
We used various image processing techniques, including thresholding, region growing, registration of small blocks of voxels, and filters to detect specific features, to first segment teeth (Supplementary Figure 2C), then bone (Figure 1E), and label various consistently identifiable points.
One measurer tended to record lower volumes than the others, achieving good agreement with volumes from CBCT segmentation (Figure 2B) and collected all other caliper measurements reported in this paper. Importantly, this results from caliper measurements were less similar to automatic CBCT analysis in other experiments (Supplementary Figure 3A). Differences between measurers is a significant problem with the caliper method, resulting in obvious inconsistencies if one person does not collect all caliper measurements for a given experiment. Variability between caliper measurements of a given tumor tended to be far greater than variability between CBCT segmentation volumes from different scans, especially for small tumors (Figure 2C). It appears that random errors in volume measurement from CT are roughly proportional to volume, whereas a significant component of the variability between caliper measurements is independent of volume (Figure 2D).
We similarly compared results from automatic segmentation to manual segmentation. For this experiment, we acquired contrast-enhanced CBCT scans of 15 large tumors. Three different styles of manual contouring (Figure 2E, F) were performed by two researchers, using ITK-Snap software, and measured distances approximating caliper measurement were included for additional comparisons (Supplementary Figure 3B). We observed poor agreement between these methods (Figure 2E). Despite visible differences between the contouring styles, it is worth noting that all of the segmentations look similar and reasonably accurate when overlayed on the scans. This suggests that for manual (or semi-automatic) image segmentation to give reproducible results would require very rigidly defined procedures. This approach is also too time-consuming to present a viable alternative to automatic segmentation for continual monitoring of many tumors.
Comparison of tumor volumes from image segmentation to caliper measurements
While a linear relationship between tumor volumes from automatic CBCT segmentation and caliper measurements by the primary tumor measurer was observed when tumors were measured at a single timepoint (Figure 2B, “Measurer One”), this was not observed when caliper measurements and CBCT scans were collected in parallel over the course of real experiments (Supplementary Figure 3A). Two such experiments were conducted, and all matched volume pairs (CT and caliper measurements of the same tumor on the same day) are shown in Figure 3A with linear and quadratic fits. It appears that caliper volumes less than 200 mm3 were nearly always overestimates and caliper volumes greater than 200 mm3 were nearly always underestimates. A coefficient of only .57 for the first order term of the linear fit is particularly concerning, as it suggests that caliper measurement substantially underestimates changes in tumor volume.
The same data are shown in Figure 3B, but with linearly interpolated connecting lines between points for each mouse. These trajectories are not entirely random; a tumor measured once to have a low caliper volume relative to CT volume tends to have a low caliper volume at the next measurement as well. Another important observation about these results is the large number of caliper measurements near 100 mm3 that correspond to CT measurements close to zero. From overlayed histograms of these volumes (Figure 3C), we see that the dataset is primarily comprised of such measurements. A Bland-Altman plot reveals strong size-dependence of disagreement between the two methods (Figure 3D). Both methods occasionally reported tumor volumes of exactly zero, although usually not for the same tumors (Figure 3E).
Effects of experimenter expectations on caliper data
One notable feature of the caliper growth curves is that several abruptly drop from about 100 mm3 to zero at day 60, when it was assumed that the surviving mice had been cured. A mouse with a tumor that would be measured as having a volume of 100 mm3 is, in fact, not necessarily distinguishable from a mouse with no tumor at all. At small volumes, the tumor is not superficial but somewhere in the cheek muscle, making caliper measurement unreliable. Furthermore, caliper measurements of small tumors tend to fluctuate together over the course of the experiment, suggesting other contextual human influence on the recorded measurements (Figure 3F, expanded in Supplementary Figure 3C). To investigate this, we fitted a function to predict caliper volume from the CT volume for each mouse (Figure 4A) and used these fits to regress out the effect of the actual tumor volume on the caliper measurement. The resulting residual measured volumes (Figure 4B) tended to be higher on some days than others, and, in fact, correlated strongly with the average total tumor volume (Figure 4C), suggesting a tendency to inflate reported volumes of small tumors when large tumors are present.
It is, in some sense, rational to inform the measurements with outside information, as the presence of large tumors may more accurately predict when the small tumors will start to regrow than observations of the small tumors themselves. However, it is obviously problematic from a data analysis perspective. The range of caliper volumes reported for tumors whose actual volumes were likely near zero (Figure 4A) introduces an opportunity for the measurer to substantially bias the average volume reported for a group of mice, which might falsely confirm an incorrect hypothesis.
Notably, the measurer had direct access to measurements from previous days when recording each new set, as is common practice. This reduces the probability of reporting clearly erroneous measurements but means that the data are not truly independent between timepoints. Tumor growth curves (Figure 3F) are likely smoothed to appear more biologically plausible through the influence of previous measurements. One undesired consequence of this can be seen in the tumor growth rates calculated by comparing consecutive volume measurements from each method (Figure 4D). In this experiment, several tumors spontaneously and unexpectedly shrank. The caliper measurer noticed this but was vocally skeptical of the negative growth rates, remeasured, and ultimately recorded fewer and less dramatic reductions in tumor volume than the automatic segmentation results eventually confirmed.
Effects of tumor shape on caliper data
We considered the tendency to record similar consecutive measurements of the same tumor as a possible contributor to the observed discrepancy between Figure 2C and Figure 3A (plotted together in Supplementary Figure 3A) but chose to investigate an alternative explanation based on differences in tumor shape. The tumors measured for Figure 2 were derived from the MOC2 cell line and mostly untreated, whereas the tumors measured for Figure 3 were derived from the P029 cell line and treated with radiotherapy. It was therefore plausible that caliper measurements tended to overestimate MOC2 tumor volumes and underestimate P029 tumor volumes because of a difference in shape. To investigate this, we computed the eigendecomposition of the covariance matrix of the coordinates of each voxel of tumor relative to its centroid (essentially principal component analysis). A convenient geometric interpretation of this is that, for a perfectly ellipsoidal tumor, the volume would be proportional to the product of the square roots of the eigenvalues. For these tumors, the two largest eigenvalues would roughly correspond to the distances measured by calipers (Figure 4E, Supplementary Figure 3B). Caliper volumes were not more strongly correlated with volumes from the ellipsoid approximation than with the volume from voxel counting overall (Supplementary Figure 4A), but the correlation was somewhat stronger for small tumors (Supplementary Figure 4B), suggesting that reported caliper measurements were not related to the size of the tumor in the dimensions measured, but missing information about the thickness of the tumor makes Eqn. 1 a poor approximation to the volume.
To compare tumor shapes between the two experiments, we “normalized” the eigenvalues by taking the square root, then dividing by the sum of the square roots (Figure 4F, G). MOC2 tumors tended to be more elongated than P029 tumors of similar volumes, having relatively smaller largest eigenvalues and larger middle eigenvalues. This is consistent with the higher caliper volumes reported in Figure 2B, because the middle eigenvalue approximately corresponds to the shorter caliper distance (top panel of Supplementary Figure 3B), which is squared in Eqn. 1. While this application of tumor shape analysis is somewhat mundane, it demonstrates a critical strength of tumor measurement by image segmentation: in an experiment where some tumors were treated or genetically modified in a way that affects their shape, caliper measurements might detect this as a difference in tumor volume, or simply miss the important effect. With segmented scans of each mouse, we can test new hypotheses about tumor shape and location post-hoc. For example, tumor location varies within experiments (Supplementary Figure 4C), as does the fraction of a tumor that could fit within a sphere of the same volume (Supplementary Figure 4D), but the MOC2 and P029 tumors show similar distributions in these metrics (Supplementary Figure 4E, F), suggesting that the elongation of P029 tumors may not be due to increased invasion into the neck.
Automatic image segmentation improves tumor growth-curve analysis
We hypothesized that a primary benefit of more accurate tumor volume data would be more fruitful analysis of tumor growth dynamics. Growth curves for three mice are shown with in Figure 5A with 3D renders of the corresponding tumors, showing deleterious eeffects of tumor shape on caliper measurement. In the first example, the tumor was very flat, causing the caliper measurer to miss it entirely (Figure 5A, left). The next tumor is of typical shape and exemplifies the seemingly random fluctuations in caliper volume at early timepoints, followed by significantly underestimated growth rate at larger at late timepoints, observed for most mice (Figure 5A, center). Finally, the third tumor did not respond as completely as most to radiotherapy, which is apparent from high CT volume as early as day 18, by not from caliper data until day 38 (Figure 5A, right).
To demonstrate the utility of more accurate tumor growth curves, we developed a simple model for the growth dynamics of tumors in these experiments, where all tumors were treated with x-ray radiotherapy (XRT):
This includes a gaussian to model the initial peak around the time of XRT, a logistic curve to model the regrowth phase, and a constant term to improve fitting to the caliper dataset (Figure 5B). This appears to capture the most important characteristics of each curve in a small number of easily interpretable parameters (Figure 5C). We hypothesized that tumor volumes during and shortly after the time of XRT might be predictive of the parameters f (the fractional rate of regrowth) and g (the time of regrowth) characterizing the eventual regrowth phase. In CT-measured curves, we found that the average volume 8-20 days after tumor implantation negatively correlated with f and g (Figure 5D, E). Similar trends may be present in the caliper data as well, but less clearly. Correlation statistics are given in Table 1. Interestingly, there is also a negative correlation between f and both the tumor’s sphericalness (Supplementary Figure 4G) and a (tumor volume at the time of XRT) (Supplementary Figure 5A), but a positive correlation between c (time for the tumor to shrink after XRT) and g (Supplementary Figure 5B). Both a and c contribute positively to the averaged tumor volumes reported (Figure 5D, E), yet the correlation between c and g is in the opposite direction, suggesting that the gaussian fit parameters may tease apart two separate effects: tumors that shrink slowly after XRT also take longer to enter the regrowth phase, and tumors that grow faster before XRT regrow more slowly. Similar correlations are also observed with the tumor shape metrics at early timepoints (Supplementary Figure 4G, Supplementary Figure 5C, D). These findings require validation and explanations beyond the scope of this paper, but demonstrate how more accurate tumor measurement can reveal biologically interesting effects that would otherwise go unnoticed.
Table 1. Statistics for correlations involving growth curve fitting parameters