Post-acquisition Standardization of Positron Emission Tomography Images

Purpose: Tissue radiotracer activity measured from positron emission tomography (PET) images is an important biomarker that is clinically utilized for diagnosis, staging, prognostication, and treatment response assessment in patients with cancer and other clinical disorders. Using PET image values to define a normal range of metabolic activity for quantification purposes is challenging due to variations in patient-related factors and technical factors. Although the formulation of standardized uptake value (SUV) has compensated for some of these variabilities, significant non-standardness still persists. We propose an image processing method to substantially mitigate these variabilities. Methods: The standardization method is similar for activity concentration (AC) PET and SUV PET images with some differences and consists of two steps. The calibration step is performed only once for each of AC PET or SUV PET, employs a set of images of normal subjects, and requires a reference object, while the transformation step is executed for each patient image to be standardized. In the calibration step, a standardized scale is determined along with 3 key image intensity landmarks defined on it including the minimum percentile intensity smin, median intensity sm, and high percentile intensity smax. smin and sm are estimated based on image intensities within the body region in the normal calibration image set. The optimal value of the maximum percentile β corresponding to the intensity smax is estimated via an optimization process by using the reference object to optimally separate the highly variable high uptake values from the normal uptake intensities. In the transformation step, the first two landmarks – the minimum percentile intensity pα(I), and the median intensity pm(I) – are found for the given image I for the body region, and the high percentile intensity pβ(I) is determined corresponding to the optimally estimated high percentile value β. Subsequently, intensities of I are mapped to the standard scale piecewise linearly for different segments. We employ three strategies for evaluation and comparison with other standardization methods: (i) Comparing coefficient of variation (CVO) of mean intensity within test objects O across different normal test subjects before and after standardization; (ii) Comparing mean absolute difference (MDO) of mean intensity within test objects O across different subjects in repeat scans before and after standardization; (iii) Comparing CVO of mean intensity across different normal subjects before and after standardization where the scans came from different brands of scanners. Results: Our data set consisted of 84 FDG-PET/CT scans of the body torso including 38 normal subjects and two repeat-scans of 23 patients. We utilized one of two objects – liver and spleen – as a reference object and the other for testing. The proposed standardization method reduced CVO and MDO by a factor of 3-8 in comparison to other standardization methods and no standardization. Upon standardization by our method, the image intensities (both for AC and SUV) from two different brands of scanners become statistically indistinguishable, while without standardization, they differ significantly and by a factor of 3-9. Conclusions: The proposed method is automatic, outperforms current standardization methods, and effectively overcomes the residual variation left over in SUV and inter-scanner variations.


Background and Rationale
Cancer is the second most common cause of death in the United States and is a significant health problem worldwide.
In 2019, about 1.8 million new cancer cases and about 0.6 million cancer deaths were reported in the United States 1 .
Positron emission tomography (PET), a non-invasive molecular imaging technique, is one of the major clinical imaging modalities used routinely for comprehensive body-wide diagnostic assessment of patients with cancer and other non-cancerous disorders. PET detects, measures, and localizes gamma rays emitted from annihilation events between positrons (emitted by administered positron-emitting isotopes) and electrons, providing for a method to distinguish tissues that have differential radiotracer activities. For example, abnormal changes in tissue metabolic activity can be detected with 18 F-fluorodeoxyglucose (FDG)-PET imaging before structural changes are detectable with computed tomography (CT) or magnetic resonance imaging (MRI). As such, metabolic activity measured from FDG-PET is an important biomarker that is clinically utilized for diagnostic, staging, prognostication, and treatment response assessment purposes in patients with cancer [2][3][4][5] .
Although qualitative assessment of PET images in clinical practice is routinely performed, quantitative assessment is encouraged to decrease inter-reader variability and to improve diagnostic performance of study interpretation. In early attempts for disease quantitative assessment in PET images, the percent of administered dose per gram of tissue was used as a measure of tumor uptake 6 . However, after comparing this metric among different patients, it was discovered that this value is affected by the patient size as well as by the radiotracer dose administered. To compensate for these factors, another quantitative measurement was introduced called Standardized Uptake Value (SUV), which is the decay-corrected tissue activity concentration of radiotracer in a region of interest (ROI) divided by the injected radiotracer dose per unit body weight (or alternatively body surface area or lean body mass) (see Equation 1) 7,8 . SUV measurement has been widely utilized for semi-quantitative PET assessment in clinical practice given its ease of use.
For any PET image I, the value I(v) at any voxel v represents the activity concentration (AC) of the radiotracer (in units of MBq/mL). This value is converted to SUV(v) at v by using the formula: injected radiotracer dose / body weight . (1) Note that injected radiotracer dose is in units of MBq, and that body weight is in units of g, where it is assumed that the average mass density of the human body is 1 g/mL (such that 1 g = 1 mL and SUV(v) is therefore unitless). The factors that can adversely affect the accurate and precise measurement of tissue radiotracer uptake as portrayed in PET images can be divided into two categories: patient-related factors and technical factors. Patient-related factors include differences in body weight, body composition, body habitus, serum glucose levels, etc. Technical factors include differences in radiotracer uptake period, partial volume effects, the size and placement of the region of interest (ROI), image acquisition methods, attenuation correction methods, image reconstruction methods, etc. 3,9,10,11 . SUV only partially compensates for certain factors such as patient body weight and administered radiotracer dose.
The uncompensated factors can make accurate and reproducible disease quantification via PET acquisitions very challenging, potentially leading to diagnostic errors during disease staging and response assessment that may adversely affect patient management and outcome, not to mention site-to-site variations and their attendant issues.
Equally importantly, these factors cause non-standardness of SUV numerically and pose challenges to image processing and analysis methods. Even if it were possible to segment object/ pathology automatically with advanced deep learning methods with the presence of non-standardness, disease measurement within segmented entities using SUVs will vary substantially. Needless to say, the original raw PET images from which voxel-wise SUV is estimated also pose challenges of at least similar magnitude. As such, methods have been developed to compensate for some of these uncompensated factors.

Related works
Some methods operate at the image acquisition level such as by using a phantom, by modifying image reconstruction, or by standardizing the parameters of scan data acquisition. Other methods operate at the patient level by controlling or correcting for the amount of radiotracer dose administered in the setting of radiotracer extravasation, compensating for the patient's serum glucose level, or by controlling the allowable delay time for radiotracer uptake. For example, Jahromi et al. compared the accuracy of SUV corrected by serum glucose levels (SUVgluc) to 4 other commonly used semi-quantitative metrics for evaluation of pulmonary nodules on FDG-PET scans and concluded that SUVgluc was the most accurate SUV parameter 15 . Laffron et al.
proposed a simple method derived from kinetic model analysis to normalize decay-corrected SUV for injection-toacquisition time differences within the range of 55-110 minutes in FDG-PET imaging 16 .
In yet other methods, standardization is performed at the image post-processing stage by using various methods such as digital PET phantoms, anatomical standardization with Z-scores, or various image transformation methods. Image-acquisition-level approaches are not very practical and cannot be used to analyze data sets that have been acquired without following the regimen required by them. Patient-level approaches do not fully correct for the nonstandardness of SUV, as there is often still variability in radiotracer uptake and since serum glucose level differentially affects FDG uptake within different tissue types, leading to overcorrections and under-corrections of SUV. Post-acquisition methods such as Z-scores generally perform a linear correction and do not account for nonlinear variations that often exist among data sets obtained from different patients. Also, most of these methods perform harmonization for a specific organ and cannot be applied to the whole-body PET images or to other organs without requiring major modifications. Moreover, they require the organ of interest to be segmented in order to normalize. Furthermore, a major drawback of current PET standardization/harmonization methods is the lack of appropriate and logical quantitative methods and metrics for evaluation. The goals of this paper are not only to demonstrate post-acquisition techniques to standardize activity concentration (AC) PET images as well as SUV PET images, but also to address the evaluation problems. We show how the proposed standardization techniques substantially improve tissue-specific meaning across patients upon standardization and also how the new metrics enable us to measure and compare among different standardization/normalization methods.
Standardization has been studied extensively for magnetic resonance imaging (MRI) starting with the method introduced by Nyul et al. a 20, 21 . They proposed a 2-step process consisting of calibration and transformation. In the calibration step, landmarks in the image intensity space (such as mean, median, quartiles, and deciles) derived from image histograms of the foreground of the image are found on a set of images for creating an intensity mapping model. In the intensity transformation step, the intensities of any given patient image are non-linearly mapped by using the landmarks to guide the transformation. One aspect of the MRI intensity standardization challenge that has direct relevance to AC PET and SUV PET images is the strategy to handle high outlier intensities. In MRI, these intensities have been shown to be due to noise and artifacts and have a similar behavior among the most commonly used MRI sequences 20 . In PET, particularly FDG-PET, which is the focus of this paper, they arise due to noise as well as the large dynamic range of high FDG concentrations in pathologic tissue regions and in some normal organs.
In MRI image analysis, the positive influence of intensity standardization on other image operations such as nonuniformity correction 22 , segmentation 23 , registration 24 , and even standardization itself has been demonstrated 22 20,21 . Normalization implies uniform (or linear) scaling of a variable, whereas the process under consideration involves non-linear mappings. Similarly, harmonization implies making image value meaning uniform without reference to a specific absolute standard value scale, whereas AC PET and SUV have standard meaning. Therefore, we suggest that standardization is a more appropriate term to describe the process.
always lying at or beyond the 99.8 percentile level 20 , independent of the MRI pulse sequence protocol. They are much harder to handle in PET/SUV images. (ii) In MRI, the outlier intensities and intensities due to pathology do not confound, as such calibration for standardization can be performed directly on the patient images irrespective of whether they are normal or abnormal. In PET/SUV images, such is not the case, and calibration must be performed based on normal scans. Furthermore, the cut off percentile is to be determined in a reference-organ-specific manner via an optimization process, as demonstrated in this paper. (iii) In view of (ii), PET standardization, unlike MRI, requires a reference organ whose normal uptake is low enough so that it is not mixed up with extremely variable high-uptake regions. High-uptake organs like heart, kidneys, and bladder are thus not useful as reference organs for PET standardization.
Despite efforts to control specific patient-related and technical factors, current PET images, including derived SUV measurements with an implicit standardization, still have considerable variability across subjects in similar tissue regions that are normal. Therefore, the goal of the standardization method is to reduce this variability from the overall effect of multiple variables, without focusing on any specific variables individually, under the assumption that we Our proposed methods do not assume that hepatic or splenic metabolism is exactly the same across normal subjects.
However, they do assume that normal hepatic or splenic metabolism should be within an expected range of variability amongst a population of subjects. Such an assumption is made all the time in the application of many types of diagnostic tests when reporting what is "normal" and "abnormal" in terms of the test results, which is largely based on our knowledge of human physiology, technical performance of the particular diagnostic test at hand, and observations of organ behaviors during PET scan interpretation.
Understanding what is "normal" is critically important to the detection, quantification, and diagnostic interpretation of PET images, as it allows one to 1) detect abnormality when present, even if subtle or diffuse within an organ of interest, given that once "normal" has been defined, everything that is "outside" normal can be defined as "abnormal"; 2) enable quantification of subtle disease and even inconspicuous disease when present beyond what is due to normal radiotracer uptake; and 3) improve accuracy of lesion-to-background measurements, which is important for quantitative assessments in cancer and in non-cancer related disorders.
Although PET scans reflect absolute measures of radiotracer uptake at the time of imaging as well as variations in imaging technique and human biological status, there is no reason to ignore information gleaned from use of populations of studies in terms of the normal level and range of radiotracer uptake within individual organs and from knowledge of human organ physiology in order to facilitate detection and quantification of pathology whenever present.
Our approach for both AC PET and SUV PET images, as described in Section 2, consists of a one-time calibration step, wherein the parameters of the standardization mapping are determined (learned), followed by the transformation step performed on any acquired patient image. Calibration is carried out by using only normal (or near-normal) images and separately for AC PET and SUV PET images, and the transformation step is applied to any given image normal or abnormal. Section 2 also describes our strategies for evaluating the effectiveness of standardization. In Section 3, we present detailed results in comparison to direct application of the MRI standardization approach and other standardization strategies. We state our concluding remarks in Section 4.
An early version of this work was presented at the SPIE Medical Imaging Conference held in Houston in February 2020 whose proceedings contained the abbreviated paper. The present paper differs from the conference paper in major ways: (i) It fully describes the background and rationale with a comprehensive review of the literature which was lacking in the conference paper. (ii) It gives full details of the method and all associated algorithms while the conference paper included just an outline for just the AC PET images and did not include SUV standardization. (iii) The evaluation is significantly expanded in this paper over the conference paper to include both AC PET and SUV PET images, comparative analysis with other methods, and repeat scan data sets of patients to show the reproducibility of the method.

Overview and Notations
Let  be a set of 3D PET images of a body region B, comprised of a stack of sequential transverse slices. In this paper, we will be studying standardization of both AC PET and SUV PET images. The standardization process is mostly the same for both AC PET and SUV PET images. Thus, we may think of  as representing either a set of AC PET images or a set of SUV PET images. Our description will be general without referring to AC or SUV PET images specifically, except when there is a deviation in the process between them, in which case, the differences will be explained.
For any image I in , we will denote its standardized image by Is. We will denote the entire standardization mapping by . Thus, per our notation, for any image I in , Is = (I). Our standardization strategy employs certain landmarks or special features of interest in the image intensity or voxel value space, observable on image intensity distributions or histograms, defined as follows. For any image I in , we will denote its minimum and maximum intensities by b It is not necessary for the whole image to be normal or disease-free. As we explain in Section 2.2, our method uses a reference organ such as liver. "Normal" implies that the reference organ should be normal in all images in the set c.
c Our intent is to reduce the variation in image intensity values in "normal" tissues of the same type across subjects as much as possible while leaving the natural variations that exist unaltered.
In the following sections, we first explain the calibration and transformation steps and then describe our evaluation strategy together with a brief outline of two common methods from the literature with which we have compared our standardization method.  Defining and estimating pm(I): Based on our examination of body-wide FDG-PET/CT scans of 552 patients, the histogram of the full 3D AC PET and SUV PET images is typically bimodal. The first mode is situated close to 0 and corresponds to activity in the background of the image outside the body region, and the second mode represents the body region. Figure 2 displays the histograms of the full body torso 3D SUV PET image from FDG-PET/CT acquisitions of one normal subject and one cancer patient. A PET axial slice at the mid abdominal level is also displayed in the figure. We select pm(I) to be the median value within the body region (second mode) in I. To find the body region, we threshold I at the mean, denoted by mean(I), of the image intensity values over the whole volume of I. The thresholded results are also shown in Figure 2 for the two studies. This simple technique worked well as verified on all 552 images tested. Note that perfect segmentation of the body region is not needed here since pm(I) is the median value within the segmented region and is not affected by minor imprecisions in the thresholded outcome.

Calibration
To verify our assertion, we segmented the body region accurately in all data sets in c by thresholding at the volume mean followed by a filling operation and performing manual corrections as needed. We found the mean  SD Then the optimum upper percentile  is chosen to be that b which minimizes O(b) over all upper percentile values over a certain interval [bL, bH]: We have taken [bL, bH] = [90, 100]. The liver is commonly used as a reference organ in FDG-PET. For example, it is used as a reference organ in the PET response criteria in solid tumors (PERCIST) response assessment system because it is relatively stable and uniform in terms of FDG uptake from scan to scan, is well-defined and sufficiently large, and has more FDG-uptake than other background organs such as adipose tissue or lung so that it is easily visible and measurable 29 . Other more FDG-avid organs like brain and heart have a lot more variable FDG uptake between scans and have more heterogeneous FDG uptake within the organs themselves. The spleen is more variable in terms of FDG uptake compared to liver, but still generally has uniform uptake and can also be used as a reference organ.
Therefore, in this work, we have used both liver and spleen as reference organs for estimating  for the calibration process. As we will demonstrate in Section 3,  estimated by using the two organs as reference yields the same value.
We note that these organs are needed as reference only in the calibration step and not for performing standardization transform on a patient scan. guarantees that (x)  is 1:1 onto and hence invertible.

Intensity transformation
A given input test image I ∈ t is converted to a standardized image Is = (I) by using two mappings.  In summary, the proposed method of standardization consists of a one-time calibration step and a transformation step that is applied to any given image. The latter step does not require any segmentation mask. In the calibration step, key parameters of the standardization mapping are estimated from a given set of PET image data sets of normal subjects. There are no parameters in the method that need manual or ad hoc adjustment, and the process is fully automatic once calibration is set up.

Iterative strategies
The method described above can be applied iteratively. That is, the standardization method can be applied to the

Evaluation Metrics
Our test set t consists of two cohorts of imagesa set n of images of normal subjects, and another set r of images of non-normal subjects where repeated scans were available within 7 days of each other. For n, our goal is to investigate how the mean intensity within certain objects O varies among all images in n before and after standardization. We expect the coefficient of variation of this mean intensity after standardization to be significantly lower than that before standardization since the subjects are considered normal e . For r, our goal is to assess the difference in mean intensity within O between the two repeat scans for each subject. We expect this difference to be significantly lower after standardization than before standardization. For both evaluation strategies, the objects MDO expresses the average of the normalized differences between O(I1) and O(I2) over all corresponding pairs of images in r. We hypothesize that s Or MD ( ) will be significantly lower than MDO(r), where s r denotes the set of standardized images corresponding to r, for both AC PET and SUV PET images. e Here "normal" means the entire liver and spleen are radiologically normal on the PET images and that the remainder of the image is radiologically near-normal with exception of minor incidental abnormalities such as small liver cysts or lung nodules.

Data sets
This retrospective study was conducted following approval from the Institutional Review Board at the Hospital of the University of Pennsylvania along with a Health Insurance Portability and Accountability Act waiver. The following data sets were utilized for this study. Our data set  contains a total of 84 FDG-PET/CT scans with the following division of the scans among subsets: |c| = 23; |t| = 61; |n| = 15; and |r| = 46; note that t is a union of n and r.

Experiments and Results
(i) Quantitative evaluation For quantitative evaluation, we have conducted four experiments: (E1) for comparing coefficient of variation before and after standardization on normal data sets; (E2) for comparing mean absolute difference obtained before and after standardization on repeat scans; (E3) for comparing among iterative strategies; and (E4) for comparing performance on normal data sets obtained from different brands of scanners. For experiments E1 and E2, we also included other methods commonly used in the literature 30 , called Gaussian normalization and Z-score normalization methods as well as the original MRI standardization method 19,20 . We will refer to the Gaussian and Z-score methods correspondingly The normalized image in the Z-method is given by In Table 1, we summarize our results from the two experiments E1 and E2 by listing CVO and MDO values before standardization and for the four methods after standardization for both AC PET and SUV PET images. In the table, s represents the set of standardized images corresponding to  ( n or r ) output by each of the different methods. Table 1. CVO and MDO values (%) for liver and spleen derived from data set n and r respectively before standardization () and for the three methods after standardization ( s ) for both AC PET and SUV PET images.

Metric
Organ AC () G-AC  We make the following observations from this table. (i) The proposed standardization method reduces CV significantlyby a factor of 3-4. Not surprisingly, the reductions are similar for AC PET and SUV PET images and for both organs. (ii) Although the concept underlying SUV reduces variability somewhat (by about 10% for both organs), significant residual variability remains. (iii) Compared to the mechanism underlying just standardization via SUV, the G-and Z-methods achieve slightly better harmonization of AC PET images, the Z-method performing slightly better, but they both fail to improve beyond this level for SUV images. More importantly, note that these (vii) Again, the proposed method outperforms the other methods and achieves a significant reduction in variations between repeat scans, with a residual variation of 3-6%.
In experiment E3, utilizing metrics CVO and MDO, we compared among the following iterative and the above noniterative strategies: s-ACs-AC; s-ACSUV; s-ACs-ACSUV; s-SUVs-SUV. The results are summarized in Table 2 for both liver and spleen. We make several key observations. (i) The SUVs resulting from s-ACSUV are far less harmonized than directly standardizing SUV PET images (s-SUV; see Table 1). However, they are slightly more harmonized than the original SUVs (4 th column in Table 1). Although s-AC achieves substantial harmonization (see Tables 1), subsequently the process of estimating SUVs from the standardized AC PET images itself introduces its own non-standardness. (ii) Repeated application of standardization (to AC PET and SUV PET images) does not seem to help since most non-standardness seems to be mitigated in the first application of standardization. For experiment E4, our goal was to study how effective the s-AC and s-SUV methods are in standardizing data sets coming from different brands of scanner. Ideally, we would like to have a sufficient number of studies in r such that, for each subject, the repeated scans I1 and I2 of the same subject come from two different brands of scanners.
Unfortunately, this is not the case, and so for E4, we chose data set n where we have 17 healthy women scanned on Siemens Biograph mCT scanner and 21 healthy men scanned on Philips Gemini TF scanner. We will refer to these two subsets by n1 and n2, respectively. For this assessment, we will assume that, upon standardization, similar SUVs are expected for the same organ in n1 and n2 since the subjects are normal. We conducted two experiments, one using a subset of n1 as set c for calibration and another using a subset of n2 as c. In the first case, let n1 =n1 -c and s n1 denote the standardized version of n1. Using the notations related to Equation (5)    In Figure 4, we show the plot of O(b) (Equation 2) as a function of the upper percentile variable b for each of liver ( Figure 4(a)) and spleen (Figure 4(b)) taken as a reference object. For AC standardization, the optimal values  found for b with [bL, bH] = [90, 100] for liver and spleen were identical, namely,  = 96.4. Similarly, for SUV standardization, these values were identical, with  = 95.6. As seen in Figure 4, the coefficient of variation suddenly rises for b > ~98 suggesting a cut off percentile beyond which image intensities are extremely variable from subject to subject.
To illustrate the uniformization effect of our standardization strategy, we display histograms of the liver AC PET images selected from n from 6 subjects in Figure 5 as follows: (a) before standardization; (b) after linear mapping determined by the maximum value in the image; and (c) upon standardization (s-AC). In Figure 6, we display histograms from SUV PET images from the liver of the same subjects where the layout is similar to that of Figure 5.
The purpose of (b) is to demonstrate that just a linear mapping of the entire AC/SUV range to a common scale does not help to standardize, and that standardization of the whole image requires a non-linear mapping. In fact, linear mapping makes matters worsethe histograms are more spread out after mapping. The point made in Table 1 about the SUV estimation process taking care of some non-standardness, but not all, existing in AC PET images is borne out in Figures 5(a) and 6(a). The histograms of subjects 5 and 6 which were far apart in AC PET images come close together in SUV PET images. However, for other subjects, such a mitigation of non-standardness did not take place.  Finally, we demonstrate via image slice display at fixed gray map windows how uniformity of numeric meaning is achieved after standardization. Figure 7 displays (top row) an abdominal slice selected from each of the 6 SUV data sets in n. The same slices from the same data sets after standardization are also displayed (bottom row). For each row, a fixed gray map window is used which is adjusted optimally for the first image in the row. It can be readily seen that standardization facilitates the use of fixed gray map windows, whereas fixed windows do not offer optimum slice visualization prior to standardization owing to non-standardness of intensity meaning.  As alluded to in Section 2, parameter smax was chosen to be 5.00 for SUV PET images so as to not lose intensity "resolution" in normal portions of the activity. The logic behind this selection is as follows. We require that, for any two distinct SUV values x and x+dx in I, we will be able to differentiate between them with a difference of at least dx. We assumed dx = 0.01 since, in clinical practice, this level of discriminability is adequate. By examining all SUV PET images that we analyzed and the associated standardization mappings, we found that smax  4.53 fulfills this requirement for dx = 0.01. Therefore, we set smax = 5.00. Similarly, for AC PET, we set smax = 50,000 Bq/mL.
For the calibration data set c of AC PET images, we observed that the scale factor (x)  ranged from 2.66 to 13.49 The mean scale factor after leaving out extreme values was 6.00, and thus, −  1 as a multiplication factor was 1/6.00 = 0.167. For the calibration data set c of SUV PET images, we observed that the scale factor (x)  ranged from 1.66 to 5.94 The mean scale factor after leaving out extreme values was 2.50, and thus, −  1 as a multiplication factor was 1/2.50 = 0.405.

Discussion and Conclusion
We proposed a new methodology for standardizing AC PET and SUV PET images individually, called s-AC and s-SUV, respectively, to overcome the effect of undesired factors that impede accurate quantitative analysis for clinical and research purposes. The methods can be directly applied to AC/SUV PET images without requiring the parameters related to the scanner, image acquisition, or the patient. They consist of a one-time calibration step wherein the parameters pertaining to the standardization mapping are estimated once and for all using a reference organ. This is followed by the transformation step wherein any given image is subjected to the standardization mapping. The using scan data from four different scanners via two metrics: (i) reduced variability in the scans of normal subjects within liver and spleen, and (ii) improved reproducibility of image intensities within these organs in repeated scans of patients with different pathologies. Improvement in uniformization is also demonstrated qualitatively through displays of histograms and images at fixed gray map settings.
The proposed s-AC and s-SUV methods have been evaluated in comparison with two commonly used strategies, namely, Gaussian and Z-score intensity normalization, demonstrating the following key advantages: (i) s-AC and s-SUV significantly outperform the G-and Z-methods in terms of the above quantitative metrics. The latter methods do not seem to be able to go beyond the normalization achieved by the SUV process and leave considerable residual  Table 1. For spleen as well, the difference turned to be less than 0.01% with  = 96.0. Another interesting finding is that direct standardization of SUV PET images is better than standardizing AC PET images followed by converting them to SUV images (see Table 2). Our recommendation is that if AC PET images are needed for subsequent image processing/ analysis operations, then perform s-AC processing, and if SUV images are the end goal, then perform s-SUV processing. Also, as shown in Table 2, one application of the standardization mapping takes care of the underlying non-standardness in AC/SUV PET images and there is no benefit in repeated application.
We used liver and spleen separately as a reference organ. For FDG-PET imaging, the optimum values  obtained for both organs are similar. If some other object or tissue region is used as reference, the optimum value  needs to be estimated via Equation (3) by using data set c in the calibration step. Similarly, if one does PET imaging with radiotracers other than FDG, then the liver and spleen may not necessarily be the best choice since the accumulation and distribution of radiotracer uptake may differ from that of FDG, and therefore, the estimation of  may have to be redone for each individual type of radiotracer utilized. Although performed on a small sample, our analysis indicates that the standardization mapping can mitigate variations potentially coming from different brands of scanners.
One limitation of this work is the rather small number of cases utilized in testing method performance, especially as related to inter-scanner variation of SUVs. Although our existing data sets came from multiple scanners, we did not have a sufficiently large number of studies from each of several brands of scanners. One of our future goals is to acquire such data sets and test our method's ability to standardize both AC PET and SUV PET intensities across all major brands of scanners currently used in clinical practice.
In summary, the proposed s-AC and s-SUV algorithms involve a one-time calibration step which requires a set c of FDG-PET data sets of normal subjects and the segmentation mask of a reference organ or tissue region for each image. All parameters needed by the method are then estimated automatically by the algorithms. Subsequently, any given FDG-PET image of a patient can be standardized automatically by using the parameters estimated in the calibration step. The algorithms are easy to implement and computationally inexpensive. Their ability to drastically reduce variations inherent in the existing SUV measurement process, especially as evidenced by our repeated scan experiments, suggests that the s-SUV measures may be used for disease measurement highly reliably. manuscript. Odhner: algorithm implementation and software development. Torigian: Data gathering, clinical application, interpretation of results, manuscript preparation.

Acknowledgements:
We acknowledge the American College of Radiology Imaging Network (ACRIN) as the source of shared data (which received funding support of the National Cancer Institute through grants U01 CA079778 and U01 CA080098) obtained from the ACRIN 6678 prospective multicenter trial in adult patients with locally advanced or metastatic (stage III or IV) non-small cell lung cancer.