The clinical performance of commercially available NCITs was assessed using 1022 adult subjects in a controlled setting. The accuracy of the NCITs in a clinical setting was evaluated using:
-
The clinical bias and the temperature measurement inconsistency represented as standard deviation (Table 2).
-
The differences in the average temperature measurements between the NCIT and reference thermometer (Table 3).
-
Number of measurements falling outside of the accuracy stated by the manufacturer (Table 3).
-
Sensitivity and specificity for predicting a subject’s temperature above 38°C (Fig. 4).
This study incorporated a very large sample size (> 1000 subjects) and used multiple NCIT models. Our results indicated that both clinical bias and uncertainty for the six NCIT models exceeded the stated accuracy in their product labeling. Only one of the six NCIT models (Model C) had a clinical bias within the manufacturer’s stated accuracy (Table 3). Depending upon the NCIT model, 48–88% of the individual temperature measurements were outside of the labeled accuracy stated by the manufacturers (Table 3). Even for Model C, which had the lowest clinical bias, 50% of the individual measurements fell outside the stated accuracy. Model E, with the highest clinical bias, had 88% of the data falling outside the stated accuracy. Statistical analysis also showed that the NCIT measurements from all six models were statistically different from the corresponding reference thermometer measurements. Overall, all our metrics highlight challenges with measuring a subject’s temperature and resulting credibility issues with NCIT measurements in a controlled setting according to the manufacturer’s instructions for use.
The accuracy of NCIT devices are currently evaluated using the ASTM E1965 and ISO 80601-2-56 standards. Both standards require the laboratory error to be within ± 0.3°C. Laboratory error measures the temperature against a standardized BBS under controlled conditions and does not include errors introduced by the proprietary software algorithm, user error, physiological variability, and environmental factors. Therefore, in a clinical setting, the variability in the NCIT temperature measurement is expected to be greater than the laboratory error. Our study illustrated that the error (ΔT) can range from − 3°C to + 2°C in extreme cases, with the majority of the errors ranging from − 2°C to + 1°C (Figs. 2 and 3) outside of the manufacturer’s stated accuracy (Table 3). Our study protocol was designed to minimize the inaccuracies due to user error (Δuser) and environmental factors (Δenvironmental). In a real-world setting (e.g. transit centers, PoEs, pre-clinical triage, and other screening locations), the additional inaccuracies and variabilities will only increase the error in NCIT-measured body temperature unless the measurement protocols control for these factors.
Our results showed that the error in the NCIT readings appears to depend upon the subject’s temperature (Fig. 3). The linear regression of the NCIT measurement error with respect to the subject’s oral temperature for all NCIT models showed a negative slope. As the subjects’ temperatures increase, the NCIT readings transition from overestimating to underestimating the oral temperature.
There are several potential explanations for the negative slope. One possibility is that the reference thermometer was inaccurate. Another possibility is that the offset algorithms used to convert forehead temperature measured by NCIT to oral temperature were inaccurate. Our reference thermometer was calibrated for accuracy across the operating temperatures (Attachment A). Our calibration data showed that the accuracy of the reference thermometer was not dependent on measured temperature. In addition, the reference temperature was obtained using a contact probe (oral) which tends to be more reliable compared to non-contact measurement. Therefore, our data indicate that the root-cause for this negative slope can likely be attributed to the offset algorithm in the NCITs. Further analysis should be done understand and address the limitations of the existing offset algorithms in the NCITs
Based on the sensitivity analysis (Fig. 4), our study showed that some of the NCITs are likely to generate significant false negative readings when used for fever detection. The sensitivity of the NCIT models at 38°C, the CDC defined temperature threshold,1 ranged between 0 to 0.69. Four of the six NCIT models had sensitivity less than 0.5 with two of them below 0.1. Therefore, four of the six models had a false negative rate of more than 50%. Because of the high probability for producing false negative readings close to the CDC threshold, these NCITs are an unreliable stand-alone temperature screening tool.
Our study included over one thousand subjects and six different NCIT models (ten units of each model for a total of sixty thermometers). The measurements were obtained under well-controlled conditions; however, we recognize that the study has several limitations. Subjects under the age of 18 were not included. The number of subjects with temperature measurements ≥ 38°C was approximately 5% of the total sample. Nonetheless, the statistical analysis showed there were sufficient subjects to analyze the adequacy of the NCIT accuracy. While there are many commercially available NCITs, for practical purposes, we focused our study on six NCIT models from different manufacturers over a wide price range. We chose these NCITs because they all targeted the center of the forehead. While we evaluated the inter- and intra-model variability in accuracy, other confounding clinical factors such as sex, age, skin tone, and weight were not considered and should be evaluated in subsequent studies.
While oral temperature measurement is a widely used in public setting as a surrogate for core temperature, it may not provide a robust measure for core temperature like the pulmonary artery (PA) temperature. The purpose of this study was not to correlate oral to PA temperatures, but to evaluate the ability of the NCIT’s to report temperatures correlating to oral temperatures, as advertised in their literature, instructions for use, and as an operational mode in all the NCITs tested. No NCIT (tested in this study) had a true core temperature mode. While PA temperature measurement would be ideal, similar comparisons have been made between infra-red cameras and oral thermometry13.
Overall, our results indicate that some NCIT devices may not be consistently accurate enough to be used as a stand-alone temperature measurement tool to determine if the temperature exceeds a specific threshold (e.g. 38°C) in an adult population. Model-to-model variability and individual model accuracy in the displayed temperature are a major source of concern. Users should be aware of the consequences of false negatives and false positives when using NCITs as a screening tool.
In addition, it is critical to follow the manufacturer’s instructions for use to minimize inaccuracies due to user error and other environmental factors in order to ensure the optimal results from these devices. The FDA published a fact sheet that contains recommendations to be followed to minimize some of the inaccuracies in the NCIT measurements.14 Factors affecting NCIT temperature measurement and their interpretations should be considered when developing the temperature measurement protocol and screening criteria.