The most convenient and least time-consuming approach to obtain an overall standing (i.e., trend, current progress etc.) of the any research field is the bibliometric analysis. It enables the researchers to summarize the overall research trends and to develop the link between the variables in the field(s). The bibliometric analysis can be used to analyze the most evaluated component(s) in the area of the research[60, 61]. Particularly in tissue engineering, huge number of materials/composites are being investigated to evaluate their efficacy to replace damaged or degrading cartilages. Among them, polymers are in the frontline in creating biomaterial substitutes (i.e., scaffolds)[62–64]. In cartilage tissue engineering, several different types and combinations of polymers are being investigated to mimic articular cartilages[8, 65]. To select the most suitable polymer(s) and/or the combination of polymers the bibliometric analysis was used in this study. The review papers were extracted from the Web of Science using the ‘cartilages’ and ‘polymers’ as the keywords. The review articles’ citation details were downloaded for the period of 2005-2020. Using bibliometrix package in R program, a list of highly cited review papers where extracted and the top ten cited papers are being summarized in Table 1.
Upon running the bibliometric analysis using ‘cartilages’ and ‘polymers’ as keywords, the most recurrent words were displayed in the form of the wordcloud as shown in Fig. 1. All of the keywords shown in the wordcloud appeared more than 70 times in the published literatures. In Fig. 5, the keywords are displayed in larger to smaller fonts depending on their recurrence in the literature. It is evident from Fig.5 that cartilage, scaffolds properties, collagen, polymers, hydrogels, mechanical strengths and chondrocytes are found to be among the most recurrent keywords. In other words, these are the most important parameters to consider while designing a new material for cartilages repair. In this study, our focus was limited to the mechanical strength of the polymer(s) to mimic the articular cartilages. Considering the mechanical properties (i.e., tensile strength, tensile modulus and elongation etc.), based on the recurrent mentions in the review papers and the data availability in the PoLyInfo database, the list of polymers/composites have been prepared to be used in the machine learning algorithms (shown in Table 2).
Selection and preprocessing of the database
Depending on the load or the direction of stretching, the components of cartilages, especially the collagen fibrils and proteoglycans, move towards the direction of the load. Initially, when the tensile stress is less, only the collagen fibers’ realignment occurs. Once the cartilage experience large deformation, the collagen attains large amount of tensile stiffness due to the stretching of collagen fibers. Once the tension is removed the collagen fibrils and proteoglycans move back to their normal position. Indeed, the viscoelasticity of cartilages in tension is best described by the mechanical properties, such as, tensile strength, elongation at break and tensile modulus[43, 67, 68]. Therefore, in this study the ranges of the tensile strength, elongation at break and tensile modulus have been considered for the database to take account of the viscoelastic behaviour of cartilages.
Typically, in the inverse design approach the properties of the polymers/composites are used as the input while the output is the most suitable composites to be used in the intended applications. In this study, the input is the numerical range of the selected properties, and the output is found as a categorical variable (i.e., string or the text). For this purpose, the scattered plots have been plotted in Figs. 6-8 to represent the raw data for tensile modulus, tensile strength and elongation at break, respectively. It is to note that the database has been created based on the list of the polymers/composites most recurrently used in cartilages repair. The raw data retrieved from the data bases consisted of outliers that were needed to be screened/removed before running the machine learning algorithms. After cleaning the outliers, the most concentrated data zones were selected for all three properties of interest. As shown in Fig. 7, the tensile strength data is so concentrated that they almost formed a straight line, whereas the tensile modulus and elongation data shown in Figs. 6 and 8 were little more scattered. The blue rectangular boxes shown in each of the Figs. 6-8 represents the numerical ranges of tensile modulus, tensile strength and elongation, respectively. Upon cleaning up of the outliers, the magnitude range of the tensile modulus, tensile strength and elongation were found to be 0-2 GPa, 0-0.2 GPa and 0-400%, respectively. These ranges are in agreement with the mechanical properties of human cartilages presented in Table 3.
Machine Learning Algorithms:
I. Random forest regression
Random forest regression is a supervised machine learning technique that predicts the data by classification or regression. It uses decision trees to predict the most optimum outputs. The random forest regression is done in three steps, and the first step involves randomly bootstrapping the samples, followed by the creation of multiple randomized trees. Finally, the prediction is retrieved from each of the tree hierarchically and the task decision is made to retrieve final outputs[51, 69, 70].
In this study, the random forest regression data consisted of the numerical input and the categorical response variables (i.e., outputs), which later can be converted to the numerical format. The training set was fed into the machine learning algorithm and several trials were run. The regression trees were developed multiple times until a minimum mean square error was obtained. A plot was developed which showed how the error varies with the increase of the number of trees (n). As shown in Fig. 9, for all three properties of interest (i.e., tensile modulus, tensile strength, elongation at break) initially the errors varied and then stabilized. From Fig. 9 (a), (b) and (c), it is evident that the number of trees (n) that is enough to stabilize the error is about 250, 250 and 500 for tensile modulus, tensile strength, elongation at break, respectively[71, 72]. Since there was a large variation of values of elongation, the extra step of scaling was carried out. As the errors have been stabilized for all three input properties, machine is then ready to predict the desired outputs.
On receiving the minimum mean square error for the training set, the testing set is then fed to the random forest regression algorithm to predict the outputs. Then the predicted output(s) were compared with the observed data to create the residual plots (shown in Fig. 10). On plotting the residual data, from Fig. 10 it is evident that the distribution of the predicted vs observed values of the properties of interest fitted substantially. There is a robust linear relationship between the predicted and the observed values for all of the three properties related to the mechanical strength of the cartilage substitutes.
The final step includes predicting the best polymers/composites for cartilage repair (as the output) by using the mechanical properties of the natural cartilages (i.e., tensile modulus, tensile strength at yield, and elongation at break) as the input. Depending on the extent of the cartilage damage, some certain mechanical property may become more important than others. However, to be qualified as the clinically successful cartilages, the biomaterials should satisfy the biomechanical requirements[74–76]. As the load bearing is the main function of the articular cartilages, their tensile biomechanical properties (i.e., tensile modulus, tensile strength, strain at fracture etc.) are particularly crucial[74, 77, 78]. Given that for some patients the tensile modulus of the developed composites may need to be matched with that of the articular cartilages, while for some other patient the tensile strength or elongation of the composites may be the most critical property to look up to. Due to that, initially the machine learning algorithm has been run for single property, followed by the combination of two and three properties. For each of this case, the predicted results have been summarized in the Table 4, which includes the mean square error of the modeled data, % variation explained and the predicted output (polymers/composites with ranks). The error was brought down to its minimum value from the mean square error for 500 random forest regression trees (as shown in Fig. 10). R square indicates the variance present in the dependent variables (the output-the name of the polymers/composites) or its extent of moderation[79, 80]. The R square value for all the fitted data was found to be less than 50%, which is reasonable for the categorical type of dependent variable[81, 82]. Typically, the R square value can be found much less than 50% when it is to deal with the categorical variables. Smith et al evaluated the efficiency of random forest regression in predicting the concentration of neurochemicals with the R square values of as low as 0.49 and 0.27. In another study, Wang et al studied the cause of death (i.e., a categorical dependent variable) in prostate cancer using the random forest model with R square ~ 0.1707.
While considering only the tensile modulus of the developed cartilage substitutes (3 to 100 MPa), the model (i.e., random forest regression) predicted poly(epsilon-caprolactone) to be the ideal polymer/composite owing to have the tensile modulus in the range of the natural cartilages. The science direct and PubMed were used to investigate whether the predicted output (poly(epsilon-caprolactone) for cartilage tissue engineering) is accurate. Using the model predicted polymers and ‘cartilage’ as keywords, number of the research articles published for each polymer was extracted and compiled in a pie chart as shown in Fig. 11. Among the predicted output, the poly(epsilon-caprolactone) is the second-highest utilized polymer as the polymer matrix for cartilage repair. Therefore, the prediction made by the random forest regression model is found to be in agreement.
Similarly, the random forest regression model predicted the polymers/composites using the tensile strength at yield and elongation at break as separate testing inputs. According to Table 3, the tensile strength at yield and elongation at break of the natural cartilages are ~35 MPa and 2-140%, respectively. Therefore, these numerical values were set as the testing inputs to predict the polymers/composites that are most suitable for cartilage repair. The model predicted that the polymer(s)/composites which are more likely to fit with set tensile strength and elongation at break of the articular cartilages are poly(lactic acid)/poly[ethene-co-(vinyl acetate)] and poly(dodecano-12-lactam), respectively. To support these predictions, Fig. 11 displayed that poly(lactic acid), poly[ethene-co-(vinyl acetate)] and poly(dodecano-12-lactam) appeared 6214, 548, 106 times in PubMed and ScienceDirect databases for cartilage tissue engineering.
In addition to the single property optimization, multi-property optimization has also been carried out using random forest regression model. The model was found to predict poly(epsilon-caprolactone)/poly(bisphenol A carbonate) blend (Table 4) as the best composites for cartilages substitutes while tensile modulus and tensile strength at yield of the natural cartilages were set as the testing input. Interestingly, the same polymer blend (i.e., poly(epsilon-caprolactone)/poly(bisphenol A carbonate)) was found to be the most suitable candidate for cartilage repair when all three properties of natural cartilages were used as the input. According to ScienceDirect and PubMed databases for the period past 10 years, poly(epsilon-caprolactone) and poly(bisphenol A carbonate) have been used as polymer matrix for the cartilage repair for 1667 and 132 times, respectively (Fig. 11). Few other combinations of multi-variable optimization have been modeled and summarized in Table 4. Among of the outputs, few of the predicted blends, such as, poly(epsilon-caprolactone)/poly (bisphenol A carbonate) or poly(lactic acid)/poly[ethene-co-(vinyl acetate)] are not that much popular in cartilage; however, these polymers individually are widely used in the development of cartilages[83–85].
Poly(lactic acid) and poly(epsilon-caprolactone) are the most widely used composites in cartilage repair [86, 87], owing to their superior mechanical integrity (i.e., viscoelasticity), remarkable biodegradability, biocompatibility and bioabsorbability. Even on degradation, their byproducts are non-toxic in nature[86, 88–91]. These two polymers belong to the nylon group (poly(dodecano-12-lactam) and 2) poly(hexano-6-lactam)). They possess most of the attributes required to perform as the scaffold for cartilages. Their chemical structure is yet another attractive feature, and its active groups are similar to those of the collagen[92–94].
More importantly, all of the random forest regression outputs (i.e., polymer(s)/composites) listed in Table 4 possess excellent thermomechanical and viscoelastic properties. Each of these polymer(s)/composites has been proven as the ideal candidates to facilitate the growth and proliferation into the chondrogenic differentiation required for cartilage repair. Several reports confirmed that due to their thermomechanical strength and biocompatibility, most of these polymers/composites (Table 4) have been approved by the FDA to be used in other fields of tissue engineering[44, 45, 96, 97].
II. Multinomial logistic regression (MNLR):
In this study, numerical independent variables (i.e., inputs) and categorical response variables (i.e., the outputs) were used. Indeed, the response variables were ninety-seven different polymers/composites, and consist of multiple levels; hence, the multinomial logistic regression (MNLR) was deemed to be suitable for modelling the response variables as factors[58, 98, 99]. The numerical factors were at two levels, and it consisted of a range of minimum and maximum values of the tensile strength at yield, tensile modulus, and elongation at break. The input was either an individual factor or a combination of multiple factors. For example, taking the tensile modulus of composites as the input (i.e., single factor), the training data sets are modelled. After modeling with the training data, the range of the tensile modulus of the cartilages are used as the testing input to predict the best polymer blends owing to have similar properties of the cartilages. The response variables were found to be the blends of poly(glycolic acid)//poly(lactic acid) and poly(methyl methacrylate)//poly(epsilon-caprolactone) (as shown in Table 5).
The goodness-of-fit model was assessed by comparing its residual deviance (Dm = −2 LLm =1466.6345) with the null hypothesis residual deviance for the model (D0 = −2 LL0 =1763.898), which includes only the intercepts. The deviance is a measure of how poorly the model reproduces the observed data. The likelihood ratio test (G = D0 − D1 =297.26296, df = 94, p < .001) compares these two deviances. The null hypothesis is rejected, indicating a statistically significant decrease in the deviance when the predictor (X) is included in the model. This means that the model fits the data better than the null model in terms of the correspondence between the observed and predicted conditional probabilities. The goodness-of-fit of modeled data was interpreted utilizing P value, and the residual deviation and its corresponding P value were summarized in Table 6. It is evident from Table 6 that the null hypothesis was rejected for all of the independent variables, and thereby, the P value is significant for all of the parameters (P<0.05).
The MNLR model was run using the neural network pack in R after 100 iterations. The residual values have been plotted against the fitted values to generate the scatter plot (Fig. 12) while considering all three independent variables (i.e., tensile modulus, tensile strength and elongation at break of the natural cartilages) used in this study. The scatter plot shown in Fig. 8 prove the data independence, homoscedasticity, and linearity. On inserting the tensile modulus of 3-100 MPa, elongation of 2-140 % and the tensile strength of 35 MPa to the already fitted model, the multinomial regression model predicted polyethene/polyethene-graft-poly(maleic anhydride) blend as the most suitable one for the cartilage repair. The predicted results along with the residual deviance for all other individual and combinatory testing inputs are summarized in Table 5.
To confirm whether the predictions made by the MNLR model are accurate and relevant to the cartilage tissue engineering, the predicted polymers/composites’ names were chosen as the keywords in the PubMed and ScienceDirect and searched. The search results were summarized in the form of a pie chart, as shown in Fig. 13. It was found that polyethylene and polylactic acid have been mentioned with cartilage tissue engineering 10603 and 6214 times, respectively. Moreover, poly lactic acid and poly caprolactone belong to the group of linear aliphatic polyester polymers and the poly caprolactone is known to increase the cell viability by 20%. Even the byproducts of the degradation of polylactic acid (i.e., water and carbon dioxide) are non-toxic in nature. Moreover, both polypropylene and polyethylene are widely used in developing implants, as they are easy to be molded to the desired shape, and are inexpensive[104–107]. They have been known to initiate a minimal immune response, and have superior mechanical (i.e., viscoelastic) properties and biocompatibility[108–110]. Particularly, both PLA and PCL can be modified to exhibit viscoelastic properties required for mimicking cartilages[111–115]. Moreover, polypropylene has proven to be an excellent candidate in the development of cartilages in the nasal reconstructive surgery . Overall, all the polymers/composites mentioned in the pie chart (Fig. 13) have been employed in the field of cartilage tissue engineering[108, 109, 117, 118].