In this study, a non-invasive and contactless methodology, thermal infrared imaging (IRI), has been used to accurately detect the boundaries of the tumor tissue on the exposed cortex during neurosurgery. Thirteen subjects with heterogeneous tumors (Table 1) were enrolled. The experimental protocol consisted in a baseline (BL) phase, a cold stress phase, with cold physiological solution injection, and a recovery (REC) phase. Thermal imaging was acquired during the whole experiment.
After reporting the boundary location of the tumor lesion on thermal IR imaging by means of a co-registration with visible imaging, salient features were extracted in both time (TD) and frequency domain (FD) in the context of the only BL phase or relatively to the whole experiment (BL + REC). Different supervised machine learning based models were developed for each patient, given the heterogeneity of the tumors. The labels of the two classes (i.e., 0 for the healthy tissue pixels and 1 for the tumor tissue pixels) were given on the basis of the boundary defined by the neurosurgeon relying on pre-operative MRI.
A preliminary inspection of the features revealed statistical significance when comparing class 0 vs. class 1 pixels relatively to the values of both TD and FD features. In particular, referring to Fig. 3 the most influencing features in TD were the c parameter and the STDREC. The c parameter is related to the initial value of the temperature of the pixels after the cold saline injection and on average the t value is positive, meaning that, in general, the starting temperature after cold stress of the healthy pixels is higher than the tumor area pixels. This finding could be interpreted as the tendency of the tumor pixels not to react quickly to cold stress and to remain in the perturbed condition longer than the healthy pixels. The other most influencing parameter in TD features inspection was the standard deviation of the thermal signals of the pixels during REC phase, i.e. after the cold saline injection. The t-value, in this case, is negative meaning that the STD of class 1 pixels is higher than the STD of class 0 pixels during the thermal recovery. This result shows the difference of thermal characteristics of the two areas of the brain and reflects the scattered behavior of the tumor area with respect to the healthy regions, which behaves more uniformly.
Referring to the FD features (Fig. 4), instead, t-tests results showed statistical significance for all the analyzed frequency bands, and the maximum of the average t-value was at f=[0.69–0.73] Hz, which is in the Cardiac band (Fig. 4b). This finding means that the wavelet coherence is able to discriminate tumor from healthy pixels more efficiently in the above mentioned frequency band with respect to all the other bands under consideration. Referring to Fig. 4b, the results are represented for the contrast class 1 vs. class 0, therefore a high value of t means that the wavelet coherence in the tumor area pixels is higher compared to the healthy pixels. This means that the healthy pixels behave differently from the tumor area pixels for all the analyzed frequency bands, especially with a high impact on the cardiac band.
Concerning the developed supervised machine learning approach, the results showed the possibility to segment the tumor lesion with respect to the healthy brain regions with high performances with every one of the developed models, reporting an accuracy that on average is always more than 70% (Fig. 6a). Among the four models, the best in terms of accuracy was the FD based classifiers relaying on the whole experimental session features (BL + REC). In this case, the accuracy was on average 90.45%, whereas for FD based classifiers relaying on the only BL features it was 72.30%. With regard to TD based models the accuracies were 78.56% and 79.86% on average, for BL and BL + REC features respectively.
Also, the sensitivity was higher for the FD classifiers relaying on the whole experimental session features (BL + REC), with 84.64% that was notably higher than the other models (30.87% for TDBL, 63.49% for TDBL+REC and 36.65% for FDBL). The models with the highest specificity, instead, were the TD classifiers relaying on the whole experimental session features (BL + REC), with 97.45% on average that was similar to the levels of specificity of TDBL and FDBL+REC models, with 95.27% and 93.74%. The lowest specificity was reported for the FDBL classifiers with a level of 75.45%.
The reported results demonstrated that FD classifiers relaying on the whole experimental session features (BL + REC) performed better with respect to the other three developed models, with high accuracy and sensitivity. This result can certainly be traced back to the fact that the input features are multiple and offer greater detail on the observed phenomenon. To note, referring to Fig. 6, it is possible to observe that also the FD classifiers relying on the only BL features had good performances and it is an important finding, because the classifier model would rely on a thermal imaging video of only one minute and without any additional measurement phase (i.e. cold stress), thus resulting more convenient during neurosurgical interventions.
In addition, an exploratory analysis was executed to relate the performances of the best models (FDBL+REC ) to the tumor categories in the sample dataset. Among all the tumor categories, FDBL+REC models seemed to perform better for metastatic tumors, with the highest values of accuracy and sensitivity. To note, the performances relative to the other classes of tumors were also very promising, being the values of accuracy, sensitivity and specificity always higher than 80%.
It is worth to note that the present work demonstrated that machine learning models based on FD features are more effective and performing that the TD features. This particular result can be traced back to the fact that the decomposition into frequency bands makes it possible to evaluate the characteristics of the signal, and in particular the specific correlation of the thermal signals in detailed frequency bands, with greater specificity and refinement with respect to the temporal signal analysis. This finding is of paramount importance also to understand the application of thermal IR imaging in the biomedical field. In fact, the IR imaging allows to assess the integration of several physiological mechanisms, which all together, affect the thermal pattern of a tissue (e.g., micro- and macro-circulation, metabolic activity of the tissue, exchange of heat with the environment) [29]. Frequency analysis of thermal signal permits of course to find the single most informative components of the underlying phenomena, allowing to obtain a more detailed insight on the dataset. Indeed, it has been largely employed in the field of thermal IR imaging applied on human studies [30–32]
It is of fundamental importance to observe that the present work is highly innovative given that it is the first time that a machine learning classifier relying on features extracted from a completely non-invasive and contactless technique has been used to segment the tumor area from the health tissue with outstanding performances.
However, several limits affected the present study. The first is related to the limited sample size. Machine learning models are based on supervised learning and the performances are highly affected by the numerosity of the study sample. Increasing the numerosity of the patients could reduce the overfitting risk. Of note, the results are cross-validated, hence the generalization performances of the model are indeed investigated, but enlarging the sample size could improve the classification outcomes. Moreover, the effect of the limited sample size can be also observed in Table 1, which shows that for some patients the thermal behaviour is not always in line with that reported in the literature, especially for patients affected by glioblastoma. In this case, many patients showed higher average basal temperature of the tumor area compared to the healthy tissue. This result could be due to the inclusion of blood vessels in the region of interest of the tumor, thus increasing the average temperature of the area. However, this is beyond the scope of this work which focuses on identifying the boundaries of the tumor area to support the neurosurgeon in brain resection. Indeed, this allows to highlight the good qualities of the developed models to classify the nature of the pixels, focusing on the single frequency components, thus allowing to consider various physiological aspects of the underlying process.
Second, the best classifiers are obtained for features relying on a time period of acquisition of nearly three minutes and on an experimental session consisting on injection of cold physiological solution. This time slot could be reduced to one minute at the price of decreasing performances. However, spraying with saline is a common practice during neurosurgery, thus constituting a mild limitation.