A better Standard-Uptake-Value body habitus normalizer for fluorodeoxyglucose in humans

Purpose: To devise a new body-habitus normalizer to be used in the calculation of a standardized uptake value (SUV) that is specific to the PET tracer 18 F-FDG. Methods: After exclusions for type and extent of cancer and timing of the scan, a cohort of 481 patients was selected for analysis of 18 F- FDG uptake into “normal” tissues (presumed to be unaffected by their disease). Among these, 65 patients had only brain concentrations measured and the remaining 416 were randomly divided into an 86-patient test set and a 330-patient training set. Within the test set, normal liver, spleen and blood measures were made. In the training set, only normal liver concentrations were measured. Using data from the training set, a simple polynomial function of height and weight was selected (following a subjective procedure) to predict each patient’s mean liver percent injected dose per milliliter. This function, when used to normalize measured %ID/ml concentrations, defines a new SUV metric (SUV fdg ) which we compared to SUV metrics normalized by body weight (SUV bw ), lean-body mass (SUV lbm ) and body surface-area (SUV bsa ) in a five-fold cross-validation. SUV fdg was also tested on the independent holdout sets utilizing the measurements of normal liver, blood, spleen and brain. Results: For patients of all sizes including pediatric patients, the normal range of liver 18 F-FDG uptake concentration at 60 minutes post injection in units of SUV fdg is 1.0 ± 0.16. Liver, blood and spleen SUV fdg in all comparisons had lower coefficients of variation (CoV) compared to SUV bw SUV lbm and SUV bsa . Blood had a mean SUV fdg value of 0.8 ± 0.11 and showed no correlation with age, height or weight. Brain SUV fdg measures were significantly higher (P<0.01) in pediatric patients (4.7 ± 0.9) compared to adults (3.1 ± 0.6). Conclusion: A new SUV metric, SUV fdg , is proposed. It is hoped that SUV fdg will prove to be better at classifying tumor lesions and other tissues compared to SUV metrics in current use and may be useful in predicting patient specific radiation dose. Other tracers may benefit from similarly tracer-specific body habitus normalizers.


INTRODUCTION
Standard clinical Positron Emission Tomography (PET) systems typically measure mean radioactivity concentration with a consistency on the order of about 2.5%, this limited primarily by the PET calibration process and the stability of the camera over time [1]. However, radioactivity concentration per se is often not a useful metric owing to its variation with the radioactivity of the injected dose. In order to monitor a tumor's uptake of 18 F-FDG over several weeks or months, for example, it is necessary to normalize the PET radioactivity concentration by the dose injected at each session, converting it into units of percent injected dose per milliliter (%ID/ml). Meaningful use of this metric assumes a degree of stationarity of the patient's bodily systems between measurements, consistent timing of the measurement post injection, and linearity of the tissue uptake with injected dose within the range of doses administered (i.e. doubling the injected dose, doubles the tissue concentrations).
While %ID/ml is useful for intra-subject comparisons, it does not allow for meaningful comparisons between patients because it does not account for the variation in tissue uptake of 18 F-FDG as a function of the patient body habitus. Larger patients tend to have lower %ID/ml concentrations because the radioactivity is distributed into a larger volume. Thus, to facilitate comparisons of tissue uptake across patients, an additional normalization is necessary. If the radiotracer distribution were to be essentially uniform within the body, then the appropriate additional normalizer would be the patient's body mass (i.e. doubling the patient's size, halves the tissue concentrations). And indeed, this is the normalizer that is used most frequently.
Normalization by injected dose and subject body mass dates back to at least 1941 when, following a recommendation by Failla (attributed by Woodard in [2]) it was employed by Kenney, Marinelli and Woodard [3] in their ex-vivo measurements of patient tumor tissue radioactivity following administration of 32 P-Na2HP04. The term they coined to describe this metric was the "differential absorption ratio" (DAR) (see equation 1) but later this function was referred to by a variety of other monikers including "differential retention", "percent mean body concentration", "differential uptake ratio", "dose uptake ratio", "dose absorption ratio" and most commonly in PET studies today, as the "standard uptake value" normalized by body weight (SUVbw).
Although the SUVbw metric is, to this day, widely employed, its deficits have frequently been raised, and at no time more strongly than by Keyes who in 1995 [4] concluded that it was a "silly useless value".
Most of Keyes' objections could easily be addressed (e.g. by fixing the uptake period) or were not really about the SUV metric itself (e.g. partial volume effects) but at least for 18 F-FDG (and likely for many other radiotracers) he correctly pointed out that interpatient differences in body composition and habitus are not well described by a linear function of body mass alone.
The need for a body habitus normalizer other than body weight stems from the fact that 18 F-FDG does not distribute equally into all the normal tissues. On a per unit mass basis, uptake into adipose tissues (in particular), is much less than most other tissues. Thus, two subjects of identical mass but one having a larger fraction of that mass in the form of adipose reserves, will tend to have larger SUVbw values in all their tissues.
Following this reasoning, Zasadny and Wahl in 1993 proposed that FDG uptake be normalized by leanbody mass (SUVlbm) and showed that SUVbw measures of normal blood, liver and spleen all retained a strong correlation with body weight, whereas for SUVlbm, this correlation was greatly diminished. In a similar vein, Kim et al proposed in 1994 [5] normalizing instead by patient body surface-area (SUVbsa) and likewise showed reductions in liver correlation with weight. In neither of these studies was the patient body habitus normalizer (i.e. the actual lean-body mass or body surface-area) measured directly.
Instead the normalizer was estimated using simple functions of height and weight, with the lean-body mass estimate making use of two separate functions, one for males and one for females. Kim et al [6] later went on to directly compare SUVlbm to SUVbsa concluding that SUVbsa was superior based upon its relative lack of correlation to body habitus metrics. Nevertheless, in 2009, Wahl et al. [7] incorporated SUVlbm (a.k.a. SUL) into their PERCIST criteria (the PET equivalent to the CT-based RECIST criteria) proposing it to be used as the standard for the evaluation of tumors using 18 F-FDG.
Debate over these SUV metrics has continued through to the present day [8], much of this highlighting the vagaries of the lean-body mass and body surface-area estimates [9,10] each of which can be calculated with one of several different formulas, while others have proposed various means of direct measurement of lean-body mass or body surface area [11][12][13][14] or other ancillary corrections [15,16].
Despite these cogitations and the evidence suggesting that either SUVbsa or SUVlbm would be a better choice, SUVbw remains as the most commonly reported metric in the literature and likely also in clinical use.
In the following, we propose to take a slightly different tact in addressing this question, recognizing that SUVbw, SUVlbm, SUVbsa are all simply functions of patient height, weight and sex, and that maybe none of these surrogates is the optimal body habitus normalizer for 18 F-FDG. Based on this premise, we will seek to devise a completely new normalizing function, one that is specific to 18 F-FDG. As was the case in the previous evaluations of SUV metrics, we will assume that 18 F-FDG uptake in a normal liver does not itself vary systematically with body habitus, age or sex. Moreover, we assert that confounding factors of any sort can only increase an SUV metric's coefficient of variation (CoV) above the liver's true normal range and thus smaller CoV values are indicative of a less biased normalizer.

Patients:
The data used in this study was acquired from patients receiving standard of care 18 F-FDG scans at our institution, mostly for the diagnosis and monitoring of cancerous lesions. Patients were excluded if they were diagnosed with a non-solid tumor type, had extensive disease, had any indication of lesions within an organ being measured or were imaged outside of the 55-75 minute post injection time window recommended by the European Association of Nuclear Medicine (EANM) [17] and the Quantitative Imaging Biomarkers Alliance (QIBA) [18]. A total of 481 patients meeting these criteria were included in the study. A subset of these (100 in all) were specifically sought after, selected based on their age (15 or under) in order to enrich the sample with smaller sized subjects.
Of the 481 patients, 65 had only their normal brain 18 F-FDG uptake measured. The remaining 416 patients were randomly divided into a 330 subject training group that received only normal liver uptake measurements and an 86-member test group within which normal liver, spleen and blood 18 F-FDG concentrations were measured. Of the 330 training group members, 153 were adult women, 116 were adult men and 61 were pediatric patients (note -here the division between pediatric and adult was taken to be 12 years of age, i.e. "adults" > 12 y). Within the test cohort there were 45 adult women, 31 adult men and 10 pediatric patients. And within the brain-only cohort, there were 14 adult women, 29 adult men and 22 pediatric patients.
Subjects were included regardless of what PET scanner model was used, so the cohort includes a mixture of scans from various GE PET cameras including Discovery PET/CT models DST, DSTE, D600, D690, D710, 3-ring DMI, 5-ring DMI and a Signa PET/MR. This data was reviewed under the auspices of a retrospective research protocol which, given the lack of risk posed to the patients by this study, allowed for a waiver of consent.

Measurements:
Within the training and test cohort patient scans a single large region of interest (ROI) representing a volume of approximately 14 ml, was drawn over a representative homogeneous central region of the liver, well away from the diaphragm. Within the test cohort scans, addition ROIs were placed over homogenous regions well within the descending aorta (to measure the blood concentration), and spleen. Within scans of the brain-only test cohort, a single ROI was placed over a frontal grey matter region. In addition to the mean radioactivity concentration within these regions, the following measures describing the patient scan were compiled: patient age at time of scan, weight, height, sex, injected radioactivity and the time interval between the injection and when the bed position over a measured region was acquired. All radioactivity concentration measures were appropriately decay corrected and divided by the injected activity to arrive at units of %ID/ml. This value was then multiplied by the patient's body weight in grams, which if one assumes 1 g/ml, results in unitless SUVbw values. The values were also multiplied by the calculated body surface-area and lean-body mass to arrive at SUVbsa and SUVlbm measures, respectively; making use of the body-surface area estimation function proposed by Du Bois [19] and the lean-body mass function used by Lodge and Wahl for PERCIST [20].

Model Development:
In seeking an empirical functional form that would well describe the relationship between the liver mean %ID/ml and body habitus, we first reasoned that these two quantities should be roughly inversely proportional and therefore chose to attempt to model the multiplicative inverse of the liver %ID/ml (i.e. its mean concentration in units of ml/%ID). Moreover, since it was our preference that our model achieve specifically a high percent accuracy and result in only positive normalizing values, we chose to fit its log values (i.e. log[ml/%ID]).
Through some experimentation with the training set, least squares fits of various functions were compared [Curve Fitting Toolbox v 3.5.11, The MathWorks, Inc.] and a subjective "best" was selected making use of Bayesian (BIC) [21] and Akaike information criteria (AIC) [22], the adjusted R-squared value [23] of the fits and a visual examination of the residuals.

Model Validation and Testing:
Using the selected fitting function model, the training set was then entered into a 5-fold cross-validation study. In this study the training set was first randomly divided into five subgroups each containing 20% (i.e. 66) of the patients. Each of the 5 groups was then, in turn, used as a validation set, with the remaining 80% (264 patients) used to train (i.e. fit) the model. In each of the five validations, CoVs and correlations to height and weight for each of the four SUV metrics (SUVbw, SUVlbm, SUVbsa and SUVfdg) were calculated and based on these numbers the performance of our proposed body habitus normalizing (BHN) function was assessed.
Following this validation, a single fitting procedure using the selected BHN model was applied to the entire 330 patient training set to determine its parameter values. This BHN function was then used to calculate the SUVfdg values for all the normal tissue measurements taken from the two test sets. As was done in the cross-validation, SUVbw, SUVlbm and SUVbsa values were determined and compared based on their CoVs and correlations to height and weight, but in addition for the test cohorts correlation to age was also tested.

Statistics:
For every test of a linear relationship between a variable (SUV, residual, etc.) to patient height, weight or age, a Pearson's correlation coefficient R and associated P value were determined. This P value indicates the probability of seeing a sample correlation coefficient of that magnitude when the true population correlation is zero and was calculated using two tails of a t-distribution with n-2 degrees of freedom (where n is the number of samples) after first converting the R value to a t-statistic using the formula = √ −2 1− 2 . In all cases significance was assessed at an alpha level of 0.05, corrected for multiple comparisons following Bonferroni [24] where indicated. The comparison of adult and pediatric brain SUVfdg values was made with an unpaired two-tailed, two sample t-test assuming unequal variances.

RESULTS
Cohort Characterization: Subjects ranged from 9 months to 91 years of age and were roughly evenly  Table 1). The AIC and adjusted R 2 values showed a slight preference for the 3 rd order model, but the BIC was best for the 2 nd order polynomial function, so both of these models remained under consideration.
Although functions of weight alone did not appear to predict the liver concentrations well, there was still a potential that the addition of height information might improve the fit substantially. Therefore, we added a linear term incorporating height to the 3 rd order function of weight (see Model C in equation 2).
However, the fit continued to be poor, especially for small patients (see supplementary figure S3) and so we dropped Model C from further consideration.
Then to ascertain whether adding weight information might improve the estimate of the 3 rd order function of height model, we plotted its residuals as a function of patient weight (see figure 1b). This plot showed that there was remaining correlation which could perhaps be improved if weight were to be incorporated. This potential also remained for the 2 nd order function of height, so to each of these models was added a single parameter, d, incorporating the weight information. We will hereafter refer to the 3 rd order height plus 1 st order weight function as Model A and the 2 nd order height plus 1 st order weight as Model B (see equation 2).

2)
[  Interestingly, measurements of gray matter uptake taken from the independent brain-only test cohort, show a small reduction in all four SUV metrics as a function of age in adult patients (see supplementary figure S6). These decreases did not reach statistical significance but are consistent with at least one other study which showed reduced brain glucose metabolic rates in older adults based on a modeled quantitative assessment of 18 F-FDG uptake [25]. SUVfdg, SUVbsa and to a lesser extent SUVlbm all showed noticeably higher levels in the pediatric patients (<= 18 y) within this cohort. The SUVfdg values for these two groups, 4.7 ± 0.9 for pediatric patients compared to 3.1 ± 0.6 for adults, were found to be significantly different (P<0.01) in a two-sample t-test. This was not the case, however, for SUVbw. All CoV and correlation results for all tissues measured in the independent test cohorts were also tabulated (see Table 3).

DISCUSSION
Herein we propose a new body habitus normalizer to be used when calculating SUV values within PET Normalization to reference tissues, or adjustments based on blood glucose measurements, or normalizations based on direct measurements of fat, muscle, and other normal tissue volumes, may all prove to be better than the metric we propose, in some contexts, however, in keeping with the spirit of SUV-type measurement, the metric we propose is be applicable in all contexts, regardless of what bodyparts are scanned and regardless of the availability of other refining variables.
A key assumption when calculating and using this new body-habitus normalizer is that the rate constants governing normal liver 18 F-FDG uptake are essentially the same (i.e. within a normal range) across all subjects regardless of age or sex. In other words, we have assumed that normal liver is itself a good normalizer, one whose uptake is proportional to the area under the curve of the arterial blood input function up until the time of the PET measurement at 60 minutes post injection. This assumption is strongly supported by our SUVfdg measurements of the blood. As such, the proposed function should also be a useful normalizer for most other normal and abnormal tissues within the body.
One potential caveat to this assumption and possible confound in this study, is that we did not screen for fatty liver disease or other liver morbidities that might correlate with body habitus [26]. As such, the proposed BHN in its current form effectively includes estimations of the prevalence and impact of these disease processes in our population. Similarly, if liver 18 F-FDG uptake in absolute terms varied significantly with age in the pediatric population, the proposed SUVfdg metric would normalize away that difference given the high correlation between age and height in children. In other words, an SUVfdg value of 1.0 can be considered to be the normal liver uptake level for pediatric subjects regardless of age even if the uptake in units of mg/min/100g were to go up or down as a function of age. Given our results in the blood, however, we feel these effects are at most, small.
When evaluating the performance of SUVfdg relative to the other SUV types we have relied on two related assessments, each SUV metric's CoV and the absence of any correlation to body habitus (specifically height and weight). The assessment based on correlation to body habitus has been used by others [5,6,27] but to our knowledge we are the first to make use of CoV for this purpose. In using CoV to assess SUV's, we reasoned that the optimal SUV metric should accurately reflect the normal variation in liver 18 F-FDG uptake and that any additional noise or confounds would only lead to increased CoV.
This should be true so long as the variance in normal values, which presumably are randomly distributed about the mean, is not itself correlated with the SUV metric. It is anticipated that because of its reduced CoV and correlation to body habitus, it will likely outperform SUVbw, SUVlbm and SUVbsa when used to distinguish between two or more conditions, for example when classifying benign and malignant tumors across multiple patients.
To the extent that the proposed BHN function can accurately predict the uptake to the normal liver and proportionately that of other normal organs (in units of %ID/g), this function may be useful in models seeking to estimate patient-specific radiation dose, thus allowing an a priori individualized assessment of the risk posed by the 18 F-FDG injection. Similarly, this same information can be used in models of patient attenuation and scatter, which can then be combined with models of specific PET cameras to arrive at estimates of the expected noise equivalent count (NEC) rate for different body-parts. This information can then, in turn, be used to adjust imaging time to achieve a target image quality.
Assuming the intrinsic resolution of most clinical PET cameras is about the same (or can be made so with appropriate smoothing) matching total effective NECs (factoring in the use of time-of-flight and the camera's timing resolution) should go a long way towards harmonizing image quality across patients of different sizes and across institutions having a mixture of PET camera models.
Although SUVfdg is specific to 18 F-FDG PET, the concept behind it should be applicable to all tracers for which a suitable normal reference tissue can be found, and where any metabolism is either consistent or at least predictable across patients.

CONCLUSION
A new body habitus normalizer and associated SUV metric are proposed. This metric, SUVfdg, is intended to be used solely for the evaluation of the uptake of FDG and may in future studies be shown to outperform SUV metrics normalized by body weight, lean-body mass and body surface area.

FIGURE 1. Scatter plots
showing the fit of the training set data to a third-order polynomial function (a), the residuals of that fit (b), the residuals of the model A fit to the log of the mean liver ml/%ID as a function of height (c) and as a function of weight (d). Blue points depict male patients, red points refer to female patients and black points are children under the age of 12. The fits of the residuals shown in (c) and (d) along with the associated correlation coefficients and P values shown in the legend, excluded patients under the age of 12 so that these patients wouldn't have outsized influence over the correlation. In all cases, regardless of whether the pediatric patients were included or not, there was no significant correlation of the residuals to body habitus.   . For the independent test data, these scatter plots compare the correlations in liver SUVbw (column a,c,e) and SUVfdg (column b,d,f) measurements with age (row a,b), height (row c,d) and weight (row e,f).

FIGURE 5.
For the independent test data, these scatter plots compare the correlations in blood SUVbw (column a,c,e) and SUVfdg (column b,d,f) measurements with age (row a,b), height (row c,d) and weight (row e,f). Note, blood concentrations were not measured in the training cohort and played no part in determining the BHN function used to calculate these SUVfdg values.

SUPPLEMENTAL FIGURES
FIGURE S1. Histogram showing the distribution of ages for the entire cohort of 481 patients. The frequencies for females are indicated by the red bars and the males, blue. Note -where the blue and red bars overlap area appears purple. This plot shows that the sampling was roughly uniform with respect to patient age, a result achieved owing to an enhanced search for younger patients. FIGURE S5. For the independent test data, these scatter plots compare the correlations in normal spleen SUVbw (column a,e,i), SUVlbm (column b,f,j), SUVbsa (column c,g,k) and SUVfdg (column d,h,l) measurements with weight (row a,b,c,d), height (row e,f,g,h) and age (row I,j,k,l). Note, spleen concentrations were not measured in the training cohort and played no part in determining the BHN function used to calculate these SUVfdg values.
FIGURE S6. For the brain-only independent test data, these scatter plots compare the correlations in normal frontal gray matter SUVbw (column a,e,i), SUVlbm (column b,f,j), SUVbsa (column c,g,k) and SUVfdg (column d,h,l) measurements with weight (row a,b,c,d), height (row e,f,g,h) and age (row I,j,k,l). Note, brain concentrations were not measured in the training cohort and played no part in determining the BHN function used to calculate these SUVfdg values. The lines and associated parameters seen in the legends were fitted to data from only the adult (>18 y) patients. The SUVfdg values suggest a significant difference in brain glucose metabolism between adult and pediatric populations.  0.000 ± 0.000 0.000 ± 0.000 0.082 ± 0.065 0.541 ± 0.194 N of 5 sig 5 5 1 0

TABLES
Results from the 5-fold cross validation comparing SUVbw, SUVlbm, SUVbsa and SUVfdg based upon the coefficients of variation (smaller is better) and correlations to height and weight (again, smaller is better). The "N of 5" rows indicate the number of times a correlation coefficient was determined to be significant. Variance and correlation results for SUVbw, SUVlbm, SUVbsa and SUVfdg metrics for data measured from patients within the independent test cohorts. 18 F-FDG concentrations were measured in normal liver, blood, spleen and brain. CoV is the coefficient of variation, R is the correlation coefficient and P is the probability of seeing an R value of that magnitude when the true population correlation is zero.