ENMs, apical toxicity and mechanistic data overview
The present study focuses on a comprehensive toxicity assessment of 31 industrially relevant ENMs. The set of ENMs covers eight different chemistries, spanning both metal and metal oxide nanomaterials (including semiconductor crystals) as well as carbonaceous materials, i.e., multi-walled carbon nanotubes and nanodiamonds (NDs) with three different surface modifications, i.e., amino/ammonium groups, carboxyl/carboxylate groups and poly(ethylene glycol) (PEG)-terminated surfaces, and unmodified ENMs (designated as “core” ENMs), along with variations in size (e.g., 5 nm versus 20 nm Au particles) and shape (i.e., spherical versus rod-shaped TiO2 particles). We focused on industrially relevant ENMs and the role of surface modifications. Although it is impossible to cover all available ENMs in the context of a single study due to limited resources, those selected here are representative of a range of different physicochemical properties and hazard potentials for robust predictions. Detailed information on the synthesis can be found in ref. 14. Moreover, the Expanded Methods in the Suppl. Information file provides details on the 31 ENMs and the experimental protocols. A rigorous grid design was applied in order to allow direct evaluation of the effects of the chemistry and surface properties, as well as their combination. The multi-omics datasets included global expression levels of mRNA, miRNA and proteins from two human cell lines (monocyte-like THP-1 cells and bronchial epithelial BEAS-2B cells), mRNA expression from mouse lung tissues, protein corona profiles, and comprehensive characterization of all the ENMs (Suppl. Table S1). THP-1 cells are commonly used as a model for evaluating the cyto- and immunotoxicity of ENMs15,16, while BEAS-2B cells are a preferred model for the in vitro assessment of pulmonary toxicity including the potential genotoxicity of ENMs17,18. The multi-omics-based analysis was based on an equipotent, low-concentration exposure (EC10). The EC10 was identified by means of concentration-response studies carried in the THP-1 and BEAS-2B cell lines, as the dose (for each of the 31 EMNs) that elicited 10% cell death. At this subtoxic concentration, the mechanism of action measurements do not mirror the toxicity, since toxic phenotypes are not yet exacerbated. In fact, while the molecular alterations (e.g., at the transcriptomics level) can be intuitively used as a relatively direct measure of toxicity when the exposure doses are high enough to observe toxic phenotypes, they become the indication of important intermediate underlying mechanisms when observed at low, subtoxic, concentrations. Moreover, high doses that allow direct observation of acute toxicity in vitro and/or in vivo are often unable to mimic real life exposure scenarios. For these reasons, we decided to measure the molecular mechanism of action (transcriptomics, proteomics, etc.) in experimental conditions (EC10) in which they would not bluntly mirror toxicity phenotypes, but at which they would instead allow the identification of early mechanistic biomarkers of toxicity, as previously described19–21.
Grouping of ENMs into hazard categories
The data collected from the in vitro assays (cytotoxicity, genotoxicity and immunotoxicity) was homogenized by applying a point-based conversion system that resulted in a general toxicity score between 1 (no toxicity) and 6 (highest toxicity). Suppl. Tables S2-S5 describe the criteria to apply the point-based conversion system, while Tables S6-S8 illustrate some examples of the homogenization results. The Bayesian information criterion (BIC) was used to determine the number of clusters for each toxicity grouping (Suppl. Figure S1). The BIC was computed varying the number of clusters between 2 and 6. The analysis showed that 2, 3 and 4 are reasonable numbers of groups since they represent a good compromise between the BIC values and the model stability. Based on the result of the BIC analysis, three hazard categories were defined: 1) no-to-low (NoL), 2) medium (M), and 3) high (H) hazard. For the sake of clarity and brevity, only the results using these three hazard categories are shown here. However, results using 2 and 4 hazard categories are also reported in the Suppl. Figure S7.
We then defined three different classification tasks (Fig. 2) based on cytotoxicity (CYT), integration of the in vitro assays (INT), and the in vivo neutrophil infiltration (NEU). In the CYT task, the 31 ENMs were grouped into the hazard categories according to their ability to cause similar patterns of cytotoxicity, while the INT task considers the combination of the homogenized assays by the means of the multi-view clustering algorithm Similar Network Fusion (SNF)22. In parallel, we profiled the immunotoxicity of the 31 ENMs in C57BL/6 mice, based on the total cell counts as well as the number of macrophages, neutrophils, eosinophils, and lymphocytes identified in the bronchoalveolar lavage (BAL) fluid. Neutrophil counts in BAL fluid after 4 days exposure are a well-documented marker of acute inflammation in mice, as we and others have shown23–26. The neutrophils (NEU) counts were then used to divide the ENMs in the hazard categories: NoL (count < 1), M (1 ≤ count < 10) and H (count ≥ 10).
Exploratory analysis based on PCA and univariate strategies
Next, we performed explorative analysis to assess the distributions of each data layer. For each dataset, the first two principal components were computed and used to plot the projected data points (Suppl. Figures S2-S4). Overall, this analysis highlights that different combinations of omics data layers and cell types convey distinct information about the mechanism of action of ENMs since the distributions of projected data points (ENMs) are remarkably different between each other. While the density of data points in the mRNA of BEAS-2B dataset is the most homogeneous, the protein corona and all the datasets collected from THP-1 are heavily influenced by a few samples. Altogether, the scatterplots highlight the absence of linear separability of the toxicity classes by using all the available molecular features. This is not a surprising phenomenon in omics datasets, in which it is expected that the majority of the total variance of the data is associated with more than two dimensions27. Moreover, to define a baseline classification capability based on the sole predictive power of each independent variable in every dataset, the level of association between every variable in each dataset and the toxicity class of the ENMs according to the different labeling schemes (CYT, INT and NEU) was evaluated by fitting the univariate logistic regression model with repeated random splits (Fig. 3a). We found that single variables are only weak predictors of the ENMs toxicity.
Selecting multivariate biomarkers for ENM safety assessment
We further hypothesized that if any relationship between the intrinsic properties/molecular data and the toxicity grouping is present, it may be encoded in higher order interaction terms. Therefore, we applied multivariate modelling strategies with the objective of finding compact sets of synergistic biomarkers.
We used toxicity-based groupings of the 31 ENMs as target variables, and the omics datasets and the intrinsic properties of the ENMs as predictors. Alternative feature selection and classification methods were used to determine the best composite biomarkers for each classification task (INT, CYT and NEU). We applied standard machine learning (ML) techniques such as Logistic Regression with PCA (LR-PCA), LASSO (a regularization-based method) and varSelRF28, and a recently described method, GARBO. GARBO is a specialized genetic algorithm that uses Fuzzy Logic and RF-based classifiers, to select biomarker sets that optimize the trade-off between classification accuracy and number of biomarkers29. GARBO, LR-PCA and LASSO were applied to select alternative biomarker models from the large-scale omics datasets (mRNA, miRNA and proteomics). Compared to LR-PCA and LASSO methods, GARBO generated composite biomarker models achieving the highest prediction accuracy in most of the used data layers (see Suppl. Figures S5-S6). We systematically evaluated the classification performances of the selected biomarkers when using a different number of groups for each defined classification task (see Suppl. Figure S7). Our evaluation highlights that the selected biomarker models achieve classification accuracies similar to those obtained when defining groupings of three classes. Figure 3 reports the classification performances of univariate models (Fig. 3a) and multivariate-based models selected by using the GARBO algorithm (Fig. 3b).
In the present paradigm, mRNA-, proteomics-, and protein corona-based models reached high accuracy (Fig. 3b). On the other hand, the models selected from the physicochemical properties performed less accurately as compared to the other data layers. As expected, the best models for the prediction of mouse lung neutrophil infiltration (NEU) were those built from the mRNA in vivo data. However, in the BEAS-2B cell line, the mRNA-based models also satisfactorily predict in vivo immunotoxicity (NEU), while proteomics-derived biomarkers have limited predictability both in respect to cytotoxicity (CYT) and immunotoxicity (NEU). On the other hand, protein-based feature sets in THP-1 cells exhibited high accuracy and stability scores for the cytotoxicity (CYT) classification task.
We also investigated whether the integration of different omics data types and physicochemical properties could improve the classification performance of the ENM safety classifier (Suppl. Figure S8). We observed that ensemble classifiers derived from single or multiple data layers in THP-1 outperformed those derived from other biological models, when considering the cytotoxicity classification task (CYT). However, when considering the integrated classification of toxicity (INT), the ensemble classifiers derived from single or multiple data layers in the BEAS-2B cell model systematically achieved the highest classification performances (Suppl. Figure S8).
Biomarkers to predict neutrophil infiltration
Bronchoalveolar lavage (BAL) immune cell identification and counting is a commonly accepted non-invasive procedure for the accurate and confident diagnosis of specific lung diseases30. Furthermore, neutrophil infiltration is a well-known marker of inflammation induced by ENM exposure. Here we asked whether the sets of specific biomarkers previously identified by means of linear regression would predict neutrophil infiltration. As shown in Suppl. Figures S9-S10, all ten models based on the in vivo mRNA data displayed satisfactory performances (R2 > 0.6) with respect to neutrophil BAL counts in mice. Predicting in vivo endpoints from in vitro data to facilitate the implementation of the 3R principles in nanosafety is a relevant topic. To this end, we analysed in vitro transcriptomics data to identify biomarkers that could serve as predictors of neutrophil infiltration in vivo (Suppl. Figure S11, Suppl Table S12). Overall, all the models derived from BEAS2B have good predictive performances (R2 > 0.6) and outperform the models generated from THP1 data.
External validation of mRNA-based classifiers
Next, external transcriptome datasets of in vitro and in vivo exposures to ENMs were retrieved from the NCBI GEO database (Suppl. Table S9) in order to validate the top 10 mRNA-based biomarker sets selected for each exposure system (THP-1, BEAS-2B and mouse lung) and classification task (CYT, INT and NEU). Suppl. Tables S10-S12 report the gene sets representing the best mRNA-based biomarker models. Some of the ENMs in the selected external datasets correspond to the same class of ENMs as the ones included in the training set (for instance, TiO2 and CNTs), while other types of materials were not represented in the training set (e.g., graphene oxide and crocidolite asbestos). For each biomarker set, an RF-based classifier was used to generate class probabilities. The scores from the top 10 RF-based models were averaged to yield one set of class probabilities for each test (NoL%, M%, H%), and the class associated with the highest score was chosen as the predicted class. Figures 4a-c report the level of toxicity predicted by each model (cell type/classification task) on each external dataset. In order to improve the legibility of the evaluation results, the predictions derived from different doses of the same exposure were averaged. The top mRNA models indicated a high hazard priority for MWCNTs (Mitsui-7, or MWCNT-mits7 in Fig. 4). MWCNT-7 (Mitsui-7), a nanomaterial known to cause damage to the lungs31 and classified as a potential human carcinogen by IARC32. Crocidolite asbestos, a known carcinogen, was also predicted as hazardous by the models trained on the toxicity classes derived from the integration of different toxicity endpoints. Anatase TiO2 was predicted to be hazardous. Indeed, the anatase form of TiO2 is known to be chemically more reactive leading to greater toxicity in vitro and in vivo as compared to the rutile form33,34. TiO2-nanobelts were also predicted as highly hazardous and this is in line with the findings of the corresponding original study35, where the authors characterized patterns of gene expression in THP-1 cells and primary small airway epithelial cells exposed to high doses of TiO2 nanobelts. Figures 4d-f indicate how close the predictions are made by different cell type/mRNA models. It is interesting to observe that THP-1-derived predictive models are closer to in vivo models than those derived using BEAS-2B when focusing on cytotoxicity. This supports the hypothesis that different in vitro models capture different aspects of the chemical exposures, hence collecting complementary data in multiple cell systems aids the in vitro-in vivo extrapolation of predictive biomarkers36. This is an important conclusion that accords well with previous studies aimed at assessing the capacity of in vitro assays to predict relevant in vivo outcomes23. Thus, it is unlikely that a single cell-based assay (focusing on a single endpoint) will accurately predict the more complex and concerted biological outcomes in vivo. Overall, the ENM safety classifier also yields robust and accurate results for external datasets and demonstrates the feasibility of a toxicogenomic based safety classification of ENMs. Moreover, this is the first study in which predictive models of nanotoxicity are validated in a large collection of manually curated public datasets. Our analysis shows that, despite the profound differences in experimental design, material characterization, and omics technologies used, published data can be of considerable practical utility when properly curated and made available to the community.
External validation of selected molecular markers
We observed that APOE, encoding apolipoprotein E, is consistently found in the THP-1 based models predictive of the cytotoxicity classification task (CYT) (Suppl. Table S10). To determine the validity of this molecular marker, we exposed THP-1 cells to a panel of amorphous SiO2 ENMs shown to display varying degrees of cytotoxicity (Fig. 5a). We then performed RT-PCR and found that the upregulation of APOE correlated with cytotoxicity when cells were exposed to 10 different SiO2 ENMs (Fig. 5b). This is thus fully in accordance with the CYT classifier for THP-1 cells (Suppl. Figure S12). However, SPNS2 was not validated as a biomarker of cytotoxicity.
Biological roles of the selected biomarkers
Focusing on the mRNA-based classifiers, several genes of interest were identified from the Suppl. Tables S10-S12. First, it is noteworthy that the BEAS-2B and THP-1 based models do not encompass the same genes nor do the in vitro models display similarities to the in vivo models at the level of the individual mRNAs. Second, for each model, it is noted that certain genes are more prevalent than others. For instance, LDLR, encoding the low-density lipoprotein receptor, is prominently featured in the case of the BEAS-2B based models selected for prediction of in vivo endpoints (NEU). Similarly, as noted above, APOE, encoding a lipid-binding protein, is consistently found in the THP-1 based models predictive of the cytotoxicity classification task (CYT) while CHIL3, encoding the chitinase-like 3 protein, is linked to the mouse lung-based models of in vivo classification. Despite the absence of endogenous chitin, a number of chitinases and chitinase-like proteins that bind but do not degrade chitin have been identified. These proteins play important roles in lung injury, and are also known to play key roles in Th2-dominated disorders such as asthma37,38. In addition, a previous study using a mouse model of ovalbumin-induced asthma revealed that exposure to graphene oxide increased macrophage production of chitinases, CHI3L1 and AMCase39.
As stated above, all the THP-1 models derived from single or multiple data layers outperformed other biological models for integrated (INT) classification tasks. Interestingly, the THP-1 based model featured three genes, namely AHRR, TMOD1, and NEAT1 (Suppl. Table S10, Suppl. Figure S13). AHRR encodes the aryl hydrocarbon receptor repressor (AhRR), which acts as a tumor suppressor gene in multiple human cancers.40 Moreover, AhRR has a major impact on regulating inflammation.41 TMOD1 encodes the tropomodulin 1 protein (Tmod1), whose role in nanotoxicity is unexplored, but it could be linked to actin cytoskeleton-related responses to ENMs. In fact, SWCNTs were previously shown to reorganize cellular actin structures.42 NEAT1 (nuclear paraspeckle assembly transcript 1), in turn, is a long non-coding RNA (lncRNA) known to be upregulated in multiple malignancies.43 Recently, NEAT1, a target gene of the tumour suppressor gene p53, was shown to enable tumorigenesis in vivo by promoting survival of oncogene-targeted cells.44 Interestingly, extracellular vesicles enriched in lncRNAs, such as NEAT1, drive fibrosis in a mouse model of ischemic heart disease.45 NEAT1 has also been suggested to drive the progression of liver fibrosis46 and, more recently, its role in the promotion of pulmonary fibrosis has been shown47,48. Indeed, fibrosis is a commonly observed adverse outcome related to ENM exposures, especially in the lung49,50. This supports our hypotheses that integrated methods may be applied to identify early biomarkers predictive of long-term outcomes of exposure. Altogether, these results not only provide a means to establish innovative tools for the prediction of the toxicity of ENMs, but also clarify important aspects related to the intermediate mechanisms of the exposure. This information can be used in further dedicated studies to draft new adverse outcome pathways and refine existing ones.