In this study, we proposed a multi-label CNN model to automatically segment tumoral lesions and background regions with healthy brain tissue in 18F-FET PET scanning. Unlike previous studies [23] where only positive 18F-FET PET scans were used to train the CNN model, we included both negative and positive cases to train and test the CNN models. As such, this model can also be evaluated for tumor detection and a classification of 18F-FET PET scans in positive and negative scans. Compared to a CNN model focusing only on tumor segmentation and giving 11 FP cases and 1 FN case, the sensitivity of the multi-label CNN model was slightly lower with 2 FN cases while specificity was substantially higher with only 2 FP cases. This can be explained by the fact that a model with an additional label for the background region representing healthy brain tissue with non-specific uptake can decrease FP detections where otherwise part of the healthy tissue is classified as tumoral tissue. For both FP cases of multi-label CNN model, there was a higher uptake in the identified lesion compared to background uptake, with one inflammatory lesion (Encephalitis) in the striatum and the adjacent insular cortex and the other lesion at the bottom of transmantle sign below the frontal lesion (see Fig. 2). While the model correctly detected the lesions with increased relative uptake, the increase was not high enough for the lesions to be classified as malignant by the physician. On the other hand, both FN cases were lesions located in central brain structures with one lesion in posterior cingulate gyrus and the second lesion in lamina quadrigemina, lamina thalamus, and in roof of fourth ventricle, but both lesions showed very limited increased FET uptake (see Fig. 2). In addition to a CNN-based approach using the 20 to 40 min static 18F-FET PET scan, we also considered a kinetic filtering approach such that the additional information of dynamic 18F-FET PET scanning could be used for the detection of tumoral lesions. For this approach, three kinetic classes were defined for healthy tissue, LGG and HGG respectively by applying the background and tumor ground truth labels on the dynamic 18F-FET PET training data while an additional, fourth kinetic class was included for blood pool by thresholding the initial perfusion weighted frames of the dynamic 18F-FET PET training data. However, evaluation using the dynamic 18F-FET PET test data resulted in all scans being classified as positive scans, mainly because it appeared to be very challenging to discriminate between blood pool and tumoral uptake with part of the blood pool generally being classified as tumoral tissue. Therefore, a kinetic filtering approach didn’t prove suitable for the detection of brain tumoral lesion although it managed to accurately identify healthy brain tissue and quantify background uptake which also supports the applicability of this approach to identify brain reference tissues. On the other hand, given the high sensitivity and specificity, the multi-label CNN model can be considered in clinical routine to assist the nuclear physician with detecting tumoral lesions in 18F-FET PET scans.
In terms of segmentation performance, the automatic segmentations obtained with the multi-label CNN model closely approximated the manual delineations provided by a nuclear physician for both the tumor and background region. Compared to an average DSC of 73.7% for a lesion only segmentation approach, the multi-label CNN approach was equally performant with an average DSC of 74.6% for the lesion segmentations. As a result, the maximal lesion uptake was always extracted accurately from the multi-label CNN-based lesion segmentations. In addition, the automatic lesion segmentations provided by the multi-label CNN model can support the volumetric analysis of brain lesions as the tumor volumes estimated by multi-label CNN model was similar to the tumor volumes obtained by the manual approach. Therefore, multi-label CNN model can be used for the extraction of textural features from 18F-FET PET scans such that random forests (RF) or other supervised machine learning algorithms can use these features to classify brain lesions into tumoral or benign tissue [28]. These more advanced analyses haven’t made it into standard clinical practice yet, mainly because of the additional workload for obtaining an accurate segmentation. This generally still requires manual interaction to avoid the vessel structures or blood pool to be identified as tumoral lesions. In line with the limited performance on lesion detectability, the kinetic filtering approach also didn’t provide accurate segmentations for the brain lesions, mainly because of confounding blood pool signal, such that the maximal lesion uptake could not be extracted accurately as well. On the other hand, our proposed multi-label CNN approach provided automatic and accurate lesion segmentations and generated an accurate estimation of the maximal lesion uptake without user interaction.
In addition, the multi-label CNN model was able to segment a background region in a way which is very similar to how it is currently done in the clinical routine where a crescent-shape VOI in a non-affected hemisphere is delineated as the background region. For positive scans with lesions in the lateral side, the model always delineated the background region in the contralateral side which is in line with the manual delineations. For negative scans with no lesions, the background region was mainly segmented as a unilateral crescent shape VOI in the contralateral side of manual background delineation while for some negative cases a bilateral crescent shape was segmented as the background region by the CNN model. However, for both positive and negative scans, the average background uptake defined using the multi-label CNN-based background region closely approximated the average background uptake obtained from the manually delineated background region. As a result, the MI or TBR, which is an important quantitative parameter for lesion characterization [29], could be estimated with high accuracy based on the lesion and background segmentations provided by the multi-label CNN. In addition, these lesion and background segmentations can be projected onto the earlier time frames to extract the tracer kinetic profile of the lesions and differentiate between LGG and HGG lesions.
The main limitation of this study is that only the 20 to 40 min time frame of a dynamic 18F-FET PET scan was used for a CNN-based classification and segmentation while a full dynamic 18F-FET PET scan could provide additional information because of differences in tracer kinetics between different tissue types. Although this will require much higher memory and computational power, future efforts will focus on extending the multi-label approach to also differentiate between HGG and LGG and using more time frames from the dynamic 18F-FET PET scan as input to the CNN model to further improve the classification and segmentation performance. In addition, the current evaluation was limited to 18F-FET PET data from a single PET center although two different PET systems were used for the data acquisition. Future evaluations could include multi-center PET data acquired with a broader range of different PET systems and scanning protocols but also with amino acid PET tracers other than 18F-FET.