3.1 General workflow of data acquisition and processing
Meta-analysis of graphene cytotoxicity data carried out following the workflow displayed in Figure 1. The topic was determined by listing search terms and further retrieved 792 literatures. Those available data is valuable to different degree thus orderly screening and selecting was conduct (SI Table 1). The second screening was carried out Subsequently after the first screening was completed, which aimed at data collection was conducted as the remaining articles was carefully checked again based on refining exclusion principle (SI Table 1). After data collection and normalization, attributes extracted from the literature describing graphene cytotoxicity including physicochemical properties,, cell- and experiment--related conditions at.al were selected and listed (Table 1). Subsequently, cell viability, IC50, and LDH as three indicators were charactered and processed, based on which importance of macroscopic properties was presented in the form of heatmap (Figure 3). As per attribute included subvariant ones in Figure 4, the subvariant matched each set once again, whether subvariant variable subject to single attribute can have significant effects on these indexes or not was analyzed.
3.2 Cell viability model
Cell viability is an important parameter for toxicity studies of nanomaterial and tend to estimate cell metabolic level indicating growth potential26. To data, the majority of studies have merely focused on material itself, such as the viability detection methods27 and oxidation state15, 17. From 57 publications, we obtained 986 cell viability-related data samples, each with attributes describing the material properties and experimental conditions by applying strict inclusion and exclusion criteria.
Use of cell viability models allows us to evaluate and predict effect of graphene on cell viability. Thus, RFs and Linear regression models (SI Figure 1, SI Figure 2) on cell viability were developed based on 10 features (selected among totally 18 attributes as part ones missing mass data removed) including specified nanoparticle-, cell-, and screening method-related ones identified from set by compiled literature (Table 1). Besides, parallel coordinate compiled for cell viability in collection of the appointed 10 attributed is shown from Figure 2. Figure 3 demonstrated that exposure dose (R2=0.331), detection method (R2=0.154) and organ source (R2=0.101) had most significant effects on CV. As was confirmed in previous studies, the exposure dose contributed to the toxicity of GO most significantly28-30. In our study, R2≥0.02 is indicative of good feature importance on CV. We find that experimental condition attributes, such as exposure dose, diameter (especially100-1000 nm), exposure time (12h, 24h and 48h) and detection method (e.g. neutral red assay and resazurin) have significant effect on CV (Figure 4). In previous researched, resazurin was a redox index for cell permeability determine. This helps to track the number of living cells and provides a stable quantitative assessment of fluorescence intensity and cell number.31, 32. Many assays support this view as they detected cell viability in embryonic and cancer cells through quantitative resazurin-based assay given that resazurin is more sensitive and can be used for multiple tests of all dyes in spite of that the oxidation and reduction reactions of many dyes to live cell membranes reflect metabolic activities.33. Additionally, the neutral red assay permits evaluating cell viability under acidic pH or hypoxic conditions more accurately than other cell viability assay,which were reported in a few studies34. Furthermore, the most commonly used colorimetric assays e.g. MTT, which readily penetrates viable eukaryotic cells and MST and WST-1 etc. have limitations including inability to explain variation in cellular metabolism throughout the life cycle35. However, MTT is still a important detection method for correlating cell viability (Figure 4).The diversity of 14 cell assays with different anatomical / biological performances may imply various toxic reactions. Regardless of that the dependence on different intrinsic properties of graphene and experimental conditions is a complex definition, the relationship between these above-mentioned attributes and toxicity degree with certain scopes over specific experimental conditions (e.g exposure time and dose). In our study, cell morph especially epithelial, neuronal and monocyte-macrophage had feature importance with CV. For example, Hu et al.36 cultured A549 cells with different concentrations of fetal bovine serum, and detected the effects of GO respectively, from which cytotoxicity of GO was obviously observed. Also, Monocyte-Macrophage following up with epithelial cells is of chief importance to cell ability of all Cell morph37. As results show that organ source including bone, adipose and blood and surface modification—PEI have top importance ranking for toxicity (Figure 4). As is known to all, PEI with high vaccination rate promotes the adsorption of GOMN adsorbents for Cr35. What is more, PEI may affect positive and negative charges then enable cells more sensitive.
Given as attributes were included in the model one by one by the correlation ranged from high to low (SI Figure 3), especially after took the oxidation state into model (Figure 5),the result was obtained, with R2 decreased distinctly, indicated that the correlation between attributes and cell viability cannot adequately represented the importance of attributes. Thus, all combination research was adopted for obtaining the most important attribute and their combination to cell viability (Table 2). Accordingly, the order in which specific attributes were added coincides with their significance for the correlation of graphene toxicity and each attribute’s contribution to ending points was calculated to prove per effect on CV via the all-combine research. Hereinafter, scrambled sequence of each attribute were considered to be arranged by employing python with optional prediction accuracy (Table 2) and for the cases, the cell viability RF regression model depending on the above-mentioned 9 attributes, showed good performance of R2≈0.820. When the above three attributes (2, 6 and 7, Table 1) were excluded from the set, the RF model for cell viability with the remaining 6 attributes (that is 1,4,5, and 8-10, Table 1) illustrated only 1.8% decrease in R2 (from 0.820 to 0.805), which is in line with the above percentage-based individual attribute contribution index. Given the trade off between the desire for increased generalization capability and strict model accuracy, the 6-attribute set was of chief significance in future prediction studies with regard to CV.
The above-mentioned top six features were adopted into model prediction which with significant R2=0.805 (data listed in Figure 6) and scatter representing the correlation of CV.
3.3 IC50 model
In order to overview the available IC50 data in huge sum for major researches, figure 7 was especially drawn to report toxicity effect over the scope of exposure concentrations.
IC50 value is the most sensitive measurement value, and logarithmic IC50 has become a common parameter presenting cellular toxicity analysis38, 39. Total of 169 distinct IC50 values (thought as an intuitive acute and chronic toxicity index derived from graphene cell viability data integrated over experiments) were identified from 49 reviewed research articles (Figure 6). In view of the above mentioned facts, IC50 values analysis was based on the same attribute set employed for cell viability (totally 10 attributes in Table 1).Top-ranked attributes of IC50 were diameter (R2=0.534), surface modification (R2=0.174), oxidation state (R2=0.151). As is shown in Fig. 3, graphene diameter (especially in the range of 102.5-103), proved to affect cell morphology and planting density40, further graphene interaction with cell membranes, so then shift graphene toxicity. From our results of IC50,poly saccharides as a kind of more negatively charged polymer adhesion on graphene, increased the volume of graphene further changed the surface charge of graphene and its derivatives17, which could explain the highest IC50 value of polysaccharide-modified graphene as surface modification is a relevant attribute in the IC50 model. However, lower IC50 value of unmodified graphene was seen as cells are directly exposed to the graphene environment, facilitating the graphene cellular uptake12. It is noted that oxygen-containing functional groups play an important role in hydrophilicity, stability and membrane affinity of graphene, thus different oxidation states contribute different degrees to IC50 value.
Among the experimental conditions, detection method like LDH, NRU and resazurin at al showed high contribution (Figure 4), which may be trace back to the mechanism of toxicity of graphene on cells such as oxidative stress21, 25. Also, epithelial cells and peripheral monocyte in cell morph are identified as relevant correlative sub-attributes for IC50, account for that the subjects cells have better sensitivity to graphene and its derivatives. However, conditional dependence on the sub-attributes of exposure dose, cell line, organ source, exposure time and cell source were hardly observed (Figure 3).
The IC50 RF model and predictive data (Figure 7) demonstrated better model prediction performance and less scatter corresponding to the correlation of CV based on attributes describing graphene physicochemical properties and experimental conditions.
3.4 LDH release model
The purpose of LDH release model development is to estimate the influence of graphene physicochemical properties on cytotoxicity particularly. From 13 publications, we finally obtained 100 LDH-related data samples, each with attributes in terms of graphene properties and experimental conditions by applying strict inclusion and exclusion criteria. Then, LDH release models were established on this basis.
Cytotoxicity was usually evaluated by the release of lactate dehydrogenase41 (LDH) as LDH is one of the enzymes in living cells and normal cells penetrate the cell membrane is almost impossible. The membrane permeability will change and LDH in cytoplasm will be released into the culture medium, further proved LDH is a sensitive final index to evaluate the toxicity of graphene. In terms of LDH release were detection method, organ source, exposure dose, demonstrating R2=0.352, 0.351 and 0.095, respectively (Figure 3). As is known to all, detection method of LDH release is often specific and limited. Of the organ source, breast (e.g. MCF10A and Hs578Bst) ranked the first (Figure 4), this assumption however has not been supported by the adequate literatures. Furthermore, some studies proveed the LDH release of graphene for breast cells yet this issue is open and needs more consideration and more evidence to clarify.
We have hypothesized that graphene cellular toxicity could be predicted based on material parameters and cell-related attributes. Of the graphene -related ones, organ、detection method、surface modification、exposure dose、diameter and oxidation state ranked the feature importance of LDH release from high to low, but cell-related attributes (that is, cell-source and cell-line in this study) has done little to LDH release (Figure 3).
For each dataset, we picked 20% of the data as the test set for RF model predicted versus observed LDH release (Figure 6). The predicted LDH model was based on seven attributes (exposure time and cell line excluded), presenting a significantly great level with R2≈0.986.
3.5 The proposed machine learning tools
In the study, some literature graphene family data attributes mentioned above own incomplete data. In view of it, 10 complete attributes associated with end points (cell viability, extended IC50, and LDH release quantity) are finally adopted. Based on Random Forests (RFs), Support Vector Machine (SVM), LASSO regression, and Elastic Net, which have been applied in various fields extensively, statistically important features, and the correlation of these characteristics were estimated. Moreover, we repeatedly use samples with connected cytotoxicity assays aimed at checking statistical models and validating predictions. As mentioned above, the machine learning tools can perfectly match with the study and type of data for this assessment.
Random Forests (RFs) 42 are integration methods, generally employs the decision tree as a base for its study. They tend to integrate the decision tree by average out with approximate unbiased model of noise to reduce the variance. Specifically, random disturbance and forest use the data sample input disturbance attributes at the same time. RFs are adopted widely in bioinformatics research including RNA methylation, protein interaction prediction and so on43. Besides, RF is characterized as adding data sample perturbations and input property perturbations during training to handle a variety of data types. In the existing medical image analysis, they are mainly used in image processing of medical images, diagnosis of assisted medical treatment, and exploration of the pathogenesis of certain diseases. Kesler 44 employed random forests to extract and classify factors related to cognitive impairment. While they play role in Chiang45 TLE research, that determine which of the hippocampus input the TLE to the left or right, Kacar’s46 multiple sclerosis research and Koley etc.47 proposed framework that based on RFs to implement automatic diagnostic of four types of brain tumors. Besides, Serag etc.48 adopted automatic SEG Mentation algorithm for human brain MRI image.
Lasso regression49 can make some characteristic index decreases, and even make some small absolute value of coefficient of direct to 0, so as to enhance the generalization ability of the model. To achieve the feature data, especially linear relationship is sparse, or to find out the main characteristics in a bunch of features, then L1 regularization (Lasso regression) is preferred. As is known, genome scanning can easily produce tens of thousands of variables, while there is little increase in samples of the medical research, leading to the number of samples smaller than it of variable conditions, for example the diabetes development prediction model.50 In these cases, the Lasso regression model could improve the accuracy of prediction, and use variable screening to simplify the model. Differently, Elastic Network51, which has been widely used in the studies of protein structure-function relationship, in the case of a lot of features linked to each other is very useful52. Lasso probably only random consider one of these characteristics, while elastic network more inclined to select more than one. Accordingly, LASSO regression tends to choose any one of independent variables to join after the screening model if these variables are extremely associated, . As for the Elastic Net model (0 < alpha < 1), the constraints between the square and round shape, so that its characteristic is in an argument or a group of randomly selected trade-offs among the independent variables.
SVM53, a tool used to find the hyperplane that maximizes the margin between the samples in various classes54. It can guarantee the extremal solutions, namely the global optimal solution rather than the local minimum. It also determines the SVM method has good generalization ability to the unknown sample. Are due to these advantages, SVM model has been widely used in various fields, specially pattern recognition55 including face detection and recognition, handwritten-digits recognition, text classification, speaker speech recognition, image recognition and retrieval, etc.56, 57 Moreover, it shows better generalization ability. Therefore, the empirical risk can be calculated based on the hinge loss function of SVM. Also, in this robust and sparse classifier regularization item was added for better structural risk58.
In the study, the developed four machine learning tools (i.e. RFs, SVM, LASSO and ENet) have been vastly applied in various fields especially toxicology. The tools use as features quantitative toxicological profiles including individual contribution of each attribute related indicators, and each accumulation get forecasts. Moreover, regression models developed for cell viability as the toxicity index by all-combined search including both most suitable attributes, based on a framework, in which model and attributes were selected automatically. Thus, a comprehensive analysis of large indicator datasets, whether it is systematic or not, was permitted. Also, the most corrective attributes are assessed for end points (i.e. cell viability, IC50 and LDH). For the sake of overcoming the challenge of integrating cytotoxicity data originating from various studies and focus on perfect predictive performance, strict criteria for inclusion was used in published literature pool, in which data was rigorously limited to those involving widely adopted cell viability, IC50 and LDH release quantity.
With regard to applicability, we draw particular attention to random forests, which are the most accurate of the current algorithms. With our research, it can effectively run in large databases. In our research, it has an effective way to evaluate the missing data and ensure its accuracy. In the absence of a large number of data (such as diameter), it is mainly random forest, which variables are important to estimate the classification.As far as applicability concerned, we emphasize on Random Forests as it has the untouchable accuracy of current algorithms and runs efficiently on a large protion of data along with our study even if massive missing data such as diameter in our study exist. Chiefly, Random forests help estimate that what variables are important in the classification.
In the study, 10 types of characteristic parameters affecting IC50, LDH and cell viability were chosen. Based on random forests, SVM, LASSO regression, Linear Regression and elastic net, statistically important features and the correlation of these characteristics were estimated. In addition to random forests models, we also analyzed the remained machine learning tools mentioned above. In the process of choice, the data was divided into the test set and validation set, with the effect of the MAE to test method validation set. Mean Absolute Error (MAE)59 tested the effect of the proposed algorithm by calculating the average absolute difference between the predicted and real values of the project score. The smaller the value is, the better the proposed algorithm performs. Therefore, MAE value was selected as the important evaluation standard in this meta-analysis. Figure 8 shows a boxplot of the prediction performance for each machine learning tool. Note that Figure 8 is based only on a certain number of assays (that have an annotation) and that 20 sets of data. Figure 8 showed that the SVM model has the greatest MAE value among all models, followed by lasso and ElasticNet models in the test set. Also, all MAE values demonstrating prediction performances of lasso (alpha was selected as 0.05, 0.1 and 0.2 respectively) and ElasticNet (ratio=0.85, 0.5, 0.1) models in different cross-validation studies did not vary appreciably. For the unified SVM, the parameter values determined from training was as follows: C = 800, 1000, 10000, given that C is used to control the cost of misclassification in the SVM. Moreover, when C = 10000, 20 sets of MAE value data are relatively concentrated.