The FIA score was defined as a simple numeric value that allows us to assess the importance of a given feature for a specific classification. A FIA score can take values between 1 and the number of features of the respective data; one example would be FIA = 1.0 or FIA = 4.33. Lower FIA scores indicate more impactful features. A feature that was not assigned an FIA score has no measurable impact on the classification results in the analyzed dataset.
The FIA method was tested on a dataset of microbial metabolites measure by means of 1D 1H NMR (Wang et al., 2021a). The data set consisted of NMR spectra from various bacterial strains grown in a culture medium. This dataset has the advantages of having well-defined, distinctive groups and controlled experimental conditions, thus minimizing variability. Also, the metabolomics data have been characterized in an earlier publication. The data set contained 80 spectra with 1384 bins (features), some of which have been previously annotated with metabolite identities.
3.1 Visualization of FIA scores
We visualized the obtained FIA scores using adapted volcano plots, shown in Fig. 1. Features with FIA scores of less than 10 are highlighted in red (decreasing during microbial growth) or blue (increasing during growth). Identified signals were labeled in the plots. These plots bear great resemblance to the volcano plots based on p-values obtained for this data (Wang et al., 2021a). It should be noted that not all features in a dataset will be assigned a FIA score, therefore not all features are shown in this kind of volcano plot, in contrast to “regular” volcano plots. Next, we further investigated the relationship between FIA scores and p-values.
3.2 Relationship of FIA scores to p-values
We hypothesized that the FIA algorithm should be able to identify features that were identified as significant by statistical tests, assuming that successful ANN models will make (at least partial) use of such features when predicting outcomes. For this, we compared p-values of features with FIA scores <4 to all other features of this group. This comparison is shown in Fig. 2A for all groups in the dataset. It is obvious that low FIA scores were connected to lower p-vales in all groups. In all but one group, this difference was significant (p ≤ 0.05). The group that was not significant (p = 0.147) is Pseudomonas. It is noteworthy that in the original publication on this dataset, this group had by far the lowest number of significant features, combined with low overall change in feature intensities, potentially linked to relatively slow growth rates of Pseudomonas (Wang et al., 2021a). Groups with only a few significant features might force ANN to use more and less significant features to predict the outcome, as compared to groups with strongly correlated features. From the volcano plots we concluded that FIA scores of less than 4 seemed to be the most impactful in this dataset.
To further investigate the relationship between FIA scores and p-values, we modeled the number of FIA scores less than 4 versus the number of significant features after FDR correction (Fig. 2B). A strong and significant inverse linear correlation was observed (p = 0.0211). This inverse correlation means that if many FIA scores were <4, there were only few significant features. Groups with few significant p-values are poorly defined and it makes sense that many different features can disrupt correct prediction of group membership in such cases. On the other hand, there will be only a few features capable of changing the prediction at FIA < 4 for well-defined groups (high number of significant p-values).
While the cutoff of 4 worked well to characterize features of high importance in this dataset, there may be other datasets and/or ANN models in which this cutoff might be too high or too low to deliver sensible outcomes. In these cases, it might be better to analyze, for example, the lowest 100 FIA scores (“top 100”). Fig. 2C shows the correlation between the mean of the top 100 FIA scores and the number of significant p-values. Even stronger positive correlations were observed in this case (p = 0.00084). This result means that lower averages of the FIA top 100 were found in less well-defined groups and vice versa. In this way, analysis of the characteristics of observed FIA scores may provide additional information about how well-defined a group is regarding metabolite signatures. In case less than 100 FIA scores are available, similar trends are seen when using the top 10 FIA scores (Figure S1 in the Supplemental Materials).
In conclusion, the FIA score method identified features that also had significant p-values in a separate statistical test. It is important to notice that in contrast to t-tests, FIA score calculation has no prerequisites such as data normality, making FIA scores more generally applicable than hypothesis tests.
FIA algorithm performance was validated on an additional dataset of human samples (Shearer et al., 2021), employing a different ANN architecture. Results were comparable and are shown in the Supplemental Materials. Interestingly, FIA analysis was able to identify features of interest that were missed by other data analysis approaches in this validation dataset.
3.3 Recommendations for interpreting FIA scores
Based on our analyses, we recommend the following rules for calculating and interpreting FIA scores:
For calculations, a real-life dataset containing a variety of samples from the different observed groups is required to allow for selecting meaningful 1% and 99% percentile feature values.
FIA scores of 1.0 indicate features of maximum impact on the prediction in all samples of the dataset. Scores between 1.01 and 1.99 indicate very strong impact, but only in part of the samples. Scores between 2.0 and 3.99 indicate strong impact of this feature. Scores equal to or larger than 4 indicate signals of medium to low impact. These recommendations are summarized in Table S1 in the Supplemental Materials.
FIA < 4 seemed to be sensible cutoff to find important features in the analyzed data. As other datasets and/or predictive models might require more complex combinations of features, no FIA scores of less than 4 might be observed. In these cases, analyzing the top 10 or top 100 FIA scores might be a more meaningful analysis in these cases. One should notice that all features that share the minimum FIA score are of equal importance. For example, if the lowest observed FIA score is 20.5, and there are 20 features with FIA = 20.5, all of these 20 features should be considered in the analysis, not just the top 10, as the feature order will be arbitrarily chosen in the case of ties.