Accelerated Discovery of the Polymer Matrix for Cartilage Repair Through Machine Learning Algorithms

Cartilage repair is one of the most challenging tasks for the orthopedic surgeons and researchers. The primary challenge lies on the fact that the development of the extracellular matrixes requires specialized cells known as chondrocytes which are sparse in numbers. Chondrocytes’ minimal self-renewal capacity makes it further troublesome and expensive to repair the cartilages. In designing successful substitutes for the cartilages, the selection of materials used for the scaffold fabrication plays the central role among several other important factors in order to ensure the success of the survival and proliferation of any biomaterial substitutes. Since last few decades, polymer and polymers' combination have been extensively used to fabricate such scaffolds and have shown promising results in terms of mechanical integrity and biocompatibility. In an empirical approach, the selection of the most appropriate polymer(s) for cartilage repair is an expensive and time-consuming affair, as traditionally, it requires numerous trials. Moreover, it is humanly impossible to go through the huge library of literature available on the potential polymer(s) and to correlate their physical, mechanical and biological properties that might be suitable for cartilage tissue engineering. With the advancement of machine learning, material design may experience a signicant reduction in experimental time and cost. The objective of this study is to implement an inverse design approach to select the best polymer(s) or composites for cartilage repair by using the machine learning algorithms, such as random forest regression (i.e., regression trees) and the multinomial logistic regression. In these algorithms, the mechanical properties of the polymers, which are similar to the cartilages, are considered as the input and the polymer(s)/composites are the predicted output. According to the random forest regression and multinomial logistic regression, the polymer(s)/composites (i.e., the output) having the closest characteristics of the articular cartilages were found to be a composite of polycaprolactone and poly(bisphenol A carbonate) and a blend of polyethylene/polyethylene-graft-poly(maleic anhydride), respectively. These composites exhibit similar biomechanical properties of the natural cartilages and initiate only minimal immune responses


Introduction
Cartilages are the connective tissues mostly present in the long bones in the human body. Its primary function is to provide lubrication and to act as a cushion against the friction on movement. The damage to these tissues can be occurred due to trauma, obesity, aging, osteoarthritis, and by few other diseases. Often even a minute tear in the cartilage over time leads to the further irreversible damage [1,2]. Patients with disintegration of cartilages experience debilitating joint pain followed by restricted movement [3,4].
Alarmingly, more than 200 million people are suffering from the osteoarthritis daily around the globe [5].
Chondrocytes play the signi cant role by producing the extracellular matrix (ECM) sought for the repair of cartilages. However, the chondrocytes have only limited capacity of self-renewal, this makes the cartilage repair di cult [6,7]. Therefore, the insertion of cartilage substitutes is deemed to be the potential solution.
The damaged cartilages are often replaced by using several surgical procedures such as the total knee replacement, microfracture, and mosaicplasty. Moreover, as a possible therapeutic option, the exponentially [28,29]. In material science, experimental design can be carried out through direct design and inverse design approach. Direct or conventional design approach involves with the prediction of the properties of the fabricated materials by taking 'materials' as the input. Recently, with advancement of machine learning a new technique of material design, namely inverse design, can be implemented. In inverse design is a fully data driven approach that predicts the target materials by putting the relevant material properties (i.e., molecular structures, physical, mechanical, thermal, biological etc.) as the input [31,32].
For example, Venkatraman et al. (2018) used an evolutionary algorithm to virtual screening of several classes of monomers while developing a batch of polymeric materials with high refractive index to determine which chemical groups have major effect in increasing the refractive indices of the developed materials [33]. In another study, Le (2020) used Gaussian Process Regression method to predict the tensile strength of the nanocomposites by setting the types and mechanical properties of the polymer matrices, types and properties of carbon nanotubes as nano llers and incorporation parameters as inputs [34]. While Venkatraman et al. (2018) [33] and Le (2020) [34] adopted direct design approach, in a recent study, Kim et al.[31] developed deep learning neural network inverse design model to predict high performance organic molecules by creating a relationship between the structure and their material properties. Very recently in 2020, Kim et al.[35] employed the inverse design approach through neural network algorithm in which 31,713 known zeolites properties were considered as input to predict 121 porous nanostructures.
To the best of the authors' knowledge, no study has yet been conducted to predict the polymer(s) or polymer composites to mimic human cartilages by machine learning algorithm(s). The primary objective of this study is to implement an inverse design approach to obtain the target polymer(s)/composites that exhibit similar properties of the human cartilage. This research was carried out in four steps, initially, the systematic bibliometric analysis was carried out using the review articles' citation data in the eld of cartilage tissue engineering, and then the relevant database was created using PoLyInfo library. Then machine learning techniques ( i.e., random forest regression and multinomial logistic regression) were used to run both single and multiple properties optimization. In the nal step the machine learning algorithms was employed to predict the polymer(s) or polymer composites that possess similar functional properties of the human cartilages (e.g., tensile modulus, tensile strength, and elongation at break).

Methodology Bibliometric analysis:
Bibliometric analysis is a powerful tool to allow the researchers to get an overview and/or the trend in which the speci c research eld is heading into. The bene t of this analysis is extract the original articles and their citation summary to run the overall publication analysis in a particular eld of interest [36,37]. From the large group of polymers and subgroups of polymers available in the market, the objective of this study was to discover the polymers/composites which are among the best to be used in cartilage repair. To retrieve the major groups of polymers/composites, the bibliometric analysis was carried out. In this study, the using 'cartilage' as the keyword, review journal articles' title, abstract and their citation reports were extracted, and bibliometric analysis was run in R program. The results containing the top ten highly cited articles were tabulated and summarized in table 1. Each review paper linked to cartilage repair was manually reviewed, and the names of the major polymers/composites mentioned in these papers were extracted and listed. The selection of these polymers/composites was done based on their recurrent usage in cartilage tissue engineering. The nal selection of the polymers/composites was made based on the availability of data in the Polyinfo database summarized in table 2.
Database creation: For the success and durability of the biomaterials, mechanical properties play a substantial part [38,39].
Speci cally, in cartilages, a primary symptom of the disease (i.e., osteoarthritis) is the deterioration of the mechanical properties of the cartilages [40]. Concerning the biomechanical properties of the cartilages, the tensile strength, tensile modulus, and elongation at break are the most sought mechanical properties, since the main function of cartilage is to hold/resist the amount of stress and compressive force exerted on the body part(s) of interest at any given moment [41] , [42,43]. The key mechanical properties of the native articular cartilages were extracted from the literature and summarized in table 3. The tensile  strength, tensile modulus and elongation of the natural cartilages reported in table 3 are 35 MPa, 3-100 MPa and 2-140%, respectively 59 . However, under 15% less strain, the tensile modulus reaches only up to 5 to 10 MPa [44]. Therefore, the database of the polymers/composites has been created taking into account of these key mechanical properties of the natural cartilages.
PolyInfo is a section of the NIIMS materials database, which extracts numerical data from the relevant sources (i.e., academic articles) [47]. In this study, the numerical values of the major mechanical properties of the polymers/composites used in cartilage repair have been retrieved using the PolyInfo database. The summarized database (table 2) includes a collection of ninety-seven polymers/composites and their related mechanical properties. The ranges of the extracted values for each of the mechanical properties were chosen as the input or independent variable in this study, whereas the names of the polymers/composites were taken as the output or the dependent variable (i.e., categorical in nature) for the machine learning algorithms. The input and output variables were chosen in such a way to implement the inverse design approach (Fig. 1). Through this design approach, the polymers/composites' names were predicted by using properties of the natural cartilage extracted from journal articles (summarized in table 3).
Machine learning Algorithms I. The Random Forest Regression: Random forest regression is an evolutionary machine learning technique that can handle the supervised and unsupervised learning, a continuous variable, and the categorical variable. Its principal function is to drive an accurate functional relationship between the independent variables and the dependent variable [48]. This technique involves an ensemble learning method wherein the rules are developed by splitting the data into smaller quantities leading to developing the smaller decision trees against the predictor. The concept of the decision tree was initially introduced by Breiman et al. in 1984[49]. In this method, decisions are not derived from using a single tree but from the cluster of trees. This diversion of data to these trees usually enables the model to be more stable [50]. On modeling, the result can be developed using 100 trees or 10,000 trees, and each tree actualizes into its regression function. The nal decision is a collective output of all regression equations [51].
Variance in the decision tree can be minimized by using the bootstrapping or bagging approach. In the process of bootstrapping, the sample is randomly subdivided from the training set to retrieve a bunch of decision trees, and this is continued with the new sample replacement using CART algorithm [52]. On completion, the aggregated and averaged results are obtained using the following mathematical equation to secure the nal output: Here, Y n is the nal predicted output and Y i is the aggregated and averaged output of the generated trees, N is the number of decision trees [53].
In this study, the simulations (i.e., training and testing) were run in R version 3.6.1 [54], and the package randomForest [55] was implemented to retrieve the decision trees. The predictor (i.e., output) chosen was a nominal categorical variable. Once the predictor was assigned to be the categorical variable k (nonnumeric variable), the random forest regression automatically assigns the value to be k-1 [56]. Adequacy of the modeled data to predict an output was demonstrated using the out of bag (OOB) error. In the random forest regression, a large number of trees were developed, and the aggregated average of predicted trees were drawn. The difference between the predicted and observed data was noted as the out-of-bag error for each observation [57].
In this study, the random forest regression operation was performed according to the owchart shown in Fig. 2. For the success of any machine learning algorithm, the preprocessing of the data is a crucial step. The main purpose of preprocessing is to convert raw data into the structured ones, through which errors are minimized. The raw data is rst inserted into R program, and then the preprocessing steps include assignment of the right variable(s) into factor or numeric, recalibration of the missing value(s) and removal of the outliers. Then the data was being split into the training and testing sets. To assure the accuracy and to minimize the over tting of data, bagging/bootstrapping is done in the algorithm. These steps were carried out systematically, as mentioned in Fig. 3, to input the operational data into a random forest regression algorithm. Upon bootstrapping, resampling of the observed and training data set was done followed by the formation of multiple trees from the subset of data. In the random forest regression algorithm, the averaging and aggregating of each tree has been done. In this model, the goodness of t was checked with the mean square error to determine whether the error is stabilized or not. On achieving the error to the minimum, the testing data (i.e., properties of the polymeric composites) without the output was being fed into the model. Once the difference between the predicted and the observed values was found to be the minimum, the mechanical properties of natural cartilages (shown in table 3) was inserted into the system to obtain the desired output (i.e., the name of the polymers/composites).

II. Multinomial logistic regression:
For dealing with the categorical dependent variable with multiple levels, very few modelling techniques are available. Among those few techniques, multinomial logistic regression (MNLR) is one of the most suitable machine learning algorithms to model the data having multiple factors and levels. The data set used to implement the multinomial logistic regression technique is typically categorical, and has multiple levels. This approach can deduce the probability of occurrence of the output in the dataset. This regression is distinct from its linear regression as it implements a sigmoidal behavior to its data [58]. To evaluate the modeled data having a categorical response variable, it is crucial to develop a relationship between the logarithm odds and the explanatory variables for the modeled data. It is given by the following equation: where x is the explanatory variable, βs are the regression coe cient of the factor(s), and p is the predicted probability. In dealing with the multiclass regression problem, a relationship between the input and output is developed by the following equation: where, k is the number of classes and βs are the regression coe cient of the factor(s)[59].
The overall work ow of MNLR is depicted in the form of a owchart in Fig.4. Initial step includes the preprocessing of the data, as the computer cannot differentiate between the factorial and numerical variables. Therefore, each parameter was needed to be assigned as either numerical or categorical. Final step of the data preprocessing includes the removal of any outlier(s) from the dataset. To check the accuracy of the prediction, the data was divided into training and testing sets. Then the training set was being fed into the algorithm and the likelihood ratio test was performed. The deviance of the null hypothesis and the residual was noted. Model's goodness of t was con rmed. Using the testing data without the output new prediction was retrieved. Once the difference of the observed and the predicted values (i.e., residual) was minimum, then the prediction was done using the tensile modulus, tensile strength and the elongation at break of the natural cartilages.

Bibliometric analysis
The most convenient and least time-consuming approach to obtain an overall standing (i.e., trend, current progress etc.) of the any research eld is the bibliometric analysis. It enables the researchers to summarize the overall research trends and to develop the link between the variables in the eld(s). The bibliometric analysis can be used to analyze the most evaluated component(s) in the area of the research[60, 61]. Particularly in tissue engineering, huge number of materials/composites are being investigated to evaluate their e cacy to replace damaged or degrading cartilages. Among them, polymers are in the frontline in creating biomaterial substitutes (i.e., scaffolds)[62-64]. In cartilage tissue engineering, several different types and combinations of polymers are being investigated to mimic articular cartilages [8,65]. To select the most suitable polymer(s) and/or the combination of polymers the bibliometric analysis was used in this study. The review papers were extracted from the Web of Science using the 'cartilages' and 'polymers' as the keywords. The review articles' citation details were downloaded for the period of 2005-2020. Using bibliometrix package in R program[66], a list of highly cited review papers where extracted and the top ten cited papers are being summarized in Table 1.
Upon running the bibliometric analysis using 'cartilages' and 'polymers' as keywords, the most recurrent words were displayed in the form of the wordcloud as shown in Fig. 1. All of the keywords shown in the wordcloud appeared more than 70 times in the published literatures. In Fig. 5, the keywords are displayed in larger to smaller fonts depending on their recurrence in the literature. It is evident from Fig.5 that cartilage, scaffolds properties, collagen, polymers, hydrogels, mechanical strengths and chondrocytes are found to be among the most recurrent keywords. In other words, these are the most important parameters to consider while designing a new material for cartilages repair. In this study, our focus was limited to the mechanical strength of the polymer(s) to mimic the articular cartilages. Considering the mechanical properties (i.e., tensile strength, tensile modulus and elongation etc.), based on the recurrent mentions in the review papers and the data availability in the PoLyInfo database, the list of polymers/composites have been prepared to be used in the machine learning algorithms (shown in Table 2).
Selection and preprocessing of the database Depending on the load or the direction of stretching, the components of cartilages, especially the collagen brils and proteoglycans, move towards the direction of the load. Initially, when the tensile stress is less, only the collagen bers' realignment occurs [67]. Once the cartilage experience large deformation, the collagen attains large amount of tensile stiffness due to the stretching of collagen bers. Once the tension is removed the collagen brils and proteoglycans move back to their normal position. Indeed, the viscoelasticity of cartilages in tension is best described by the mechanical properties, such as, tensile strength, elongation at break and tensile modulus [43,67,68]. Therefore, in this study the ranges of the tensile strength, elongation at break and tensile modulus have been considered for the database to take account of the viscoelastic behaviour of cartilages.
Typically, in the inverse design approach the properties of the polymers/composites are used as the input while the output is the most suitable composites to be used in the intended applications. In this study, the input is the numerical range of the selected properties, and the output is found as a categorical variable (i.e., string or the text). For this purpose, the scattered plots have been plotted in Figs. 6-8 to represent the raw data for tensile modulus, tensile strength and elongation at break, respectively. It is to note that the database has been created based on the list of the polymers/composites most recurrently used in cartilages repair. The raw data retrieved from the data bases consisted of outliers that were needed to be screened/removed before running the machine learning algorithms. After cleaning the outliers, the most concentrated data zones were selected for all three properties of interest. As shown in Fig. 7, the tensile strength data is so concentrated that they almost formed a straight line, whereas the tensile modulus and elongation data shown in Figs. 6 and 8 were little more scattered. The blue rectangular boxes shown in each of the Figs. 6-8 represents the numerical ranges of tensile modulus, tensile strength and elongation, respectively. Upon cleaning up of the outliers, the magnitude range of the tensile modulus, tensile strength and elongation were found to be 0-2 GPa, 0-0.2 GPa and 0-400%, respectively. These ranges are in agreement with the mechanical properties of human cartilages presented in Table 3.
Machine Learning Algorithms:

I. Random forest regression
Random forest regression is a supervised machine learning technique that predicts the data by classi cation or regression. It uses decision trees to predict the most optimum outputs. The random forest regression is done in three steps, and the rst step involves randomly bootstrapping the samples, followed by the creation of multiple randomized trees. Finally, the prediction is retrieved from each of the tree hierarchically and the task decision is made to retrieve nal outputs [51,69,70].
In this study, the random forest regression data consisted of the numerical input and the categorical response variables (i.e., outputs), which later can be converted to the numerical format. The training set was fed into the machine learning algorithm and several trials were run. The regression trees were developed multiple times until a minimum mean square error was obtained. A plot was developed which showed how the error varies with the increase of the number of trees (n). As shown in Fig. 9, for all three properties of interest (i.e., tensile modulus, tensile strength, elongation at break) initially the errors varied and then stabilized. From Fig. 9 (a), (b) and (c), it is evident that the number of trees (n) that is enough to stabilize the error is about 250, 250 and 500 for tensile modulus, tensile strength, elongation at break, respectively [71,72]. Since there was a large variation of values of elongation, the extra step of scaling was carried out. As the errors have been stabilized for all three input properties, machine is then ready to predict the desired outputs.
On receiving the minimum mean square error for the training set, the testing set is then fed to the random forest regression algorithm to predict the outputs. Then the predicted output(s) were compared with the observed data to create the residual plots (shown in Fig. 10). On plotting the residual data, from Fig. 10 it is evident that the distribution of the predicted vs observed values of the properties of interest tted substantially. There is a robust linear relationship between the predicted and the observed values for all of the three properties related to the mechanical strength of the cartilage substitutes.
The nal step includes predicting the best polymers/composites for cartilage repair (as the output) by using the mechanical properties of the natural cartilages (i.e., tensile modulus, tensile strength at yield, and elongation at break) as the input. Depending on the extent of the cartilage damage, some certain mechanical property may become more important than others [73]. However, to be quali ed as the clinically successful cartilages, the biomaterials should satisfy the biomechanical requirements [74][75][76].
As the load bearing is the main function of the articular cartilages, their tensile biomechanical properties (i.e., tensile modulus, tensile strength, strain at fracture etc.) are particularly crucial [74,77,78]. Given that for some patients the tensile modulus of the developed composites may need to be matched with that of the articular cartilages, while for some other patient the tensile strength or elongation of the composites may be the most critical property to look up to. Due to that, initially the machine learning algorithm has been run for single property, followed by the combination of two and three properties. For each of this case, the predicted results have been summarized in the Table 4, which includes the mean square error of the modeled data, % variation explained and the predicted output (polymers/composites with ranks). The error was brought down to its minimum value from the mean square error for 500 random forest regression trees (as shown in Fig. 10). R square indicates the variance present in the dependent variables (the output-the name of the polymers/composites) or its extent of moderation [79,80]. The R square value for all the tted data was found to be less than 50%, which is reasonable for the categorical type of dependent variable [81,82]. Typically, the R square value can be found much less than 50% when it is to deal with the categorical variables. While considering only the tensile modulus of the developed cartilage substitutes (3 to 100 MPa), the model (i.e., random forest regression) predicted poly(epsilon-caprolactone) to be the ideal polymer/composite owing to have the tensile modulus in the range of the natural cartilages. The science direct and PubMed were used to investigate whether the predicted output (poly(epsilon-caprolactone) for cartilage tissue engineering) is accurate. Using the model predicted polymers and 'cartilage' as keywords, number of the research articles published for each polymer was extracted and compiled in a pie chart as shown in Fig. 11. Among the predicted output, the poly(epsilon-caprolactone) is the second-highest utilized polymer as the polymer matrix for cartilage repair. Therefore, the prediction made by the random forest regression model is found to be in agreement.
Similarly, the random forest regression model predicted the polymers/composites using the tensile strength at yield and elongation at break as separate testing inputs. According to Table 3, the tensile strength at yield and elongation at break of the natural cartilages are ~35 MPa and 2-140%, respectively. Therefore, these numerical values were set as the testing inputs to predict the polymers/composites that are most suitable for cartilage repair. The model predicted that the polymer(s)/composites which are more likely to t with set tensile strength and elongation at break of the articular cartilages are poly(lactic acid)/poly[ethene-co-(vinyl acetate)] and poly(dodecano-12-lactam), respectively. To support these predictions, Fig. 11 displayed that poly(lactic acid), poly[ethene-co-(vinyl acetate)] and poly(dodecano-12lactam) appeared 6214, 548, 106 times in PubMed and ScienceDirect databases for cartilage tissue engineering.
In addition to the single property optimization, multi-property optimization has also been carried out using random forest regression model. The model was found to predict poly(epsiloncaprolactone)/poly(bisphenol A carbonate) blend (Table 4) as the best composites for cartilages substitutes while tensile modulus and tensile strength at yield of the natural cartilages were set as the testing input. Interestingly, the same polymer blend (i.e., poly(epsilon-caprolactone)/poly(bisphenol A carbonate)) was found to be the most suitable candidate for cartilage repair when all three properties of natural cartilages were used as the input. According to ScienceDirect and PubMed databases for the period past 10 years, poly(epsilon-caprolactone) and poly(bisphenol A carbonate) have been used as polymer matrix for the cartilage repair for 1667 and 132 times, respectively (Fig. 11). Few other combinations of multi-variable optimization have been modeled and summarized in Table 4. Among of the outputs, few of the predicted blends, such as, poly(epsilon-caprolactone)/poly (bisphenol A carbonate) or poly(lactic acid)/poly[ethene-co-(vinyl acetate)] are not that much popular in cartilage; however, these polymers individually are widely used in the development of cartilages[83-85].
Poly(lactic acid) and poly(epsilon-caprolactone) are the most widely used composites in cartilage repair [86,87], owing to their superior mechanical integrity (i.e., viscoelasticity), remarkable biodegradability, biocompatibility and bioabsorbability. Even on degradation, their byproducts are non-toxic in nature[86, 88-91]. These two polymers belong to the nylon group (poly(dodecano-12-lactam) and 2) poly(hexano-6lactam)). They possess most of the attributes required to perform as the scaffold for cartilages. Their chemical structure is yet another attractive feature, and its active groups are similar to those of the collagen [92][93][94].
More importantly, all of the random forest regression outputs (i.e., polymer(s)/composites) listed in Table  4 possess excellent thermomechanical and viscoelastic properties. Each of these polymer(s)/composites has been proven as the ideal candidates to facilitate the growth and proliferation into the chondrogenic differentiation required for cartilage repair [95]. Several reports con rmed that due to their thermomechanical strength and biocompatibility, most of these polymers/composites (Table 4) have been approved by the FDA to be used in other elds of tissue engineering [44,45,96,97].

II. Multinomial logistic regression (MNLR):
In this study, numerical independent variables (i.e., inputs) and categorical response variables (i.e., the outputs) were used. Indeed, the response variables were ninety-seven different polymers/composites, and consist of multiple levels; hence, the multinomial logistic regression (MNLR) was deemed to be suitable for modelling the response variables as factors[58, 98, 99]. The numerical factors were at two levels, and it consisted of a range of minimum and maximum values of the tensile strength at yield, tensile modulus, and elongation at break. The input was either an individual factor or a combination of multiple factors. For example, taking the tensile modulus of composites as the input (i.e., single factor), the training data sets are modelled. After modeling with the training data, the range of the tensile modulus of the cartilages are used as the testing input to predict the best polymer blends owing to have similar properties of the cartilages. The response variables were found to be the blends of poly(glycolic acid)//poly(lactic acid) and poly(methyl methacrylate)//poly(epsilon-caprolactone) (as shown in Table 5).
The goodness-of-t model was assessed by comparing its residual deviance (D m = −2 LL m =1466.6345) with the null hypothesis residual deviance for the model (D 0 = −2 LL 0 =1763.898), which includes only the intercepts. The deviance is a measure of how poorly the model reproduces the observed data. The likelihood ratio test (G = D 0 − D 1 =297.26296, df = 94, p < .001) compares these two deviances. The null hypothesis is rejected, indicating a statistically signi cant decrease in the deviance when the predictor (X) is included in the model. This means that the model ts the data better than the null model in terms of the correspondence between the observed and predicted conditional probabilities. The goodness-of-t of modeled data was interpreted utilizing P value, and the residual deviation and its corresponding P value were summarized in Table 6. It is evident from Table 6 that the null hypothesis was rejected for all of the independent variables, and thereby, the P value is signi cant for all of the parameters (P<0.05).
The MNLR model was run using the neural network pack in R after 100 iterations [100]. The residual values have been plotted against the tted values to generate the scatter plot (Fig. 12) while considering all three independent variables (i.e., tensile modulus, tensile strength and elongation at break of the natural cartilages) used in this study. The scatter plot shown in Fig. 8 prove the data independence, homoscedasticity, and linearity. On inserting the tensile modulus of 3-100 MPa, elongation of 2-140 % and the tensile strength of 35 MPa to the already tted model, the multinomial regression model predicted polyethene/polyethene-graft-poly(maleic anhydride) blend as the most suitable one for the cartilage repair. The predicted results along with the residual deviance for all other individual and combinatory testing inputs are summarized in Table 5.
To con rm whether the predictions made by the MNLR model are accurate and relevant to the cartilage tissue engineering, the predicted polymers/composites' names were chosen as the keywords in the PubMed and ScienceDirect and searched. The search results were summarized in the form of a pie chart, as shown in Fig. 13. It was found that polyethylene and polylactic acid have been mentioned with cartilage tissue engineering 10603 and 6214 times, respectively. Moreover, poly lactic acid and poly caprolactone belong to the group of linear aliphatic polyester polymers [101] and the poly caprolactone is known to increase the cell viability by 20% [102]. Even the byproducts of the degradation of polylactic acid (i.e., water and carbon dioxide) are non-toxic in nature [103]. Moreover, both polypropylene and polyethylene are widely used in developing implants, as they are easy to be molded to the desired shape, and are inexpensive [104][105][106][107]. They have been known to initiate a minimal immune response, and have superior mechanical (i.e., viscoelastic) properties and biocompatibility [108][109][110]. Particularly, both PLA and PCL can be modi ed to exhibit viscoelastic properties required for mimicking cartilages [111][112][113][114][115]. Moreover, polypropylene has proven to be an excellent candidate in the development of cartilages in the nasal reconstructive surgery [116]. Overall, all the polymers/composites mentioned in the pie chart (Fig.  13) have been employed in the eld of cartilage tissue engineering[108, 109,117,118].

Conclusion
The design of new biomaterials is a complex, tedious, and time-consuming affair. Designing cartilage substitutes is even more intricate due to their unique properties/functionality and their diverse locations in the human body. Among many, viscoelasticity is one of the most important parameters that need to be taken into serious consideration in designing cartilages. More importantly, the viscoelasticity of the cartilages may not be attributed to any single property, rather it is better represented by a set of mechanical properties such as, tensile strength, tensile modulus, and the elongation at break. Therefore, it is expected that the best polymer matrices/composites to be used in cartilage repair must exhibit these properties as much as in the ranges of the properties of the natural articular cartilages. This study attempts to use inverse design approach by using two machine learning algorithms (i.e., random forest and multinomial logistic regression) to predict the most suitable polymers/composites for cartilage substitutes by using the ranges of the tensile modulus, elongation at break and tensile strength of the natural cartilages as inputs. Both single and multivariable optimization were conducted so that the output was predicted by using both individual and combinatory properties of the cartilages. While considering all three properties of interest, poly(epsilon-caprolactone)/poly(bisphenol A carbonate) and polyethene//polyethene-graft-poly(maleic anhydride) were found to be the best polymer(s)/composites for cartilage repair using the random forest and multinomial logistic regression techniques, respectively. All of the predicted polymer(s)/composites in both machine learning algorithms are FDA approved to be used in cartilages tissue engineering; more importantly they possess the similar tensile biomechanical properties of the natural cartilages, and may only initiate minimal immune responses in the body environment.
However, the limitation of this study lies into the low level of goodness of t of the modeled data, which is largely attributed to the response variable to be categorical in nature. Different machine learning algorithms may be explored to handle the categorical variable(s) with the multi-levels. Moreover, the biological properties of the natural cartilages may be included as the inputs in the future research, although still there is a lack of appropriate database to correlate the properties of the stem cells linked to the polymer matrix/composites to be used in cartilage repair.   poly(vinyl chloride) 3.    poly(glycolic acid)//poly(lactic acid) 1.       Figure 1 Inverse design approach.

Figure 11
Pie chart displaying the number of times the RF algorithm predicted polymers appeared in the science direct and PubMed journals.

Figure 12
Scatter plot of the residual verse tted value.

Figure 13
Pie chart displaying the number of times the MNLR algorithm predicted polymers appeared in the ScienceDirect and PubMed journals.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.