Advances in Computational Intelligence of Polymer Composite Materials: Machine Learning Assisted Modeling, Analysis and Design

The superior multi-functional properties of polymer composites have made them an ideal choice for aerospace, automobile, marine, civil, and many other technologically demanding industries. The increasing demand of these composites calls for an extensive investigation of their physical, chemical and mechanical behavior under different exposure conditions. Machine learning (ML) has been recognized as a powerful predictive tool for data-driven multi-physical modeling, leading to unprecedented insights and exploration of the system properties beyond the capability of traditional computational and experimental analyses. Here we aim to abridge the findings of the large volume of relevant literature and highlight the broad spectrum potential of ML in applications like prediction, optimization, feature identification, uncertainty quantification, reliability and sensitivity analysis along with the framework of different ML algorithms concerning polymer composites. Challenges like the curse of dimensionality, overfitting, noise and mixed variable problems are discussed, including the latest advancements in ML that have the potential to be integrated in the field of polymer composites. Based on the extensive literature survey, a few recommendations on the exploitation of various ML algorithms for addressing different critical problems concerning polymer composites are provided along with insightful perspectives on the potential directions of future research.


Introduction
Aerospace, automobile, marine, civil and many other technologically demanding industries are looking for alternative multi-functional materials which have a combination of excellent tunable properties like high strength, light-weight, corrosion resistance, thermal insulation, acoustic damping, high fracture toughness along with aesthetic features. Polymer composites are found to have a good mix of the above-mentioned properties along with rapid and easy manufacturability which makes them a suitable alternative material to be utilized in various modern industries [1][2][3][4][5]. Generally, polymer composite is made up of two phases: the matrix phase (continuous) and the reinforcement phase (dispersed). Usually, a thermosetting or thermoplastic organic polymer serves as the matrix, the basic purpose of which is to bind the reinforcement and transfer the load uniformly to the embedded reinforcement [6]. Different types of materials are used to strengthen the polymeric matrix and are known as reinforcing agents. These materials can be in the form of natural or man-made fibers, particles, whiskers and fragments [7,8]. Some examples include glass, carbon, kevlar, aramid, flax, hemp, jute, sisal, coir, alumina, mica, basalt etc. [9]. Considering the environmental and sustainability issues, biocomposites are getting popular recently and for this purpose biofibers and biopolymers are being used [10]. Most of the biofibers and biopolymers are biodegradable in nature and are a great solution for the waste disposal problem. The most important advantage of using the biopolymers as matrix in composites is that they are renewable and neutral in terms of carbon. However, the challenge in ramping up the applicability of such composites is the huge variation in their properties as they get affected by a number of variables like the type of biopolymer and the natural fibers used, environmental condition of their source, type of the fiber modification and the processing techniques [11]. From a computational viewpoint, this leads to a significantly large space of input dimensions, which makes the analysis and design process more complicated. On a larger perspective, polymer composites are classified into different categories based on the type of reinforcing material used (refer to Figure 1).
Shape, size, chemical composition, amount of the reinforcement and the way reinforcing agents are added along with the manufacturing process of the composite greatly affect the overall resulting properties of the composite materials [12][13][14][15][16][17][18]. If continuous fibers are used to reinforce the matrix, then parameters like orientation of the fibers and the stacking sequence are important parameters to be considered. If the composite is particulate reinforced, then the aspect ratio of the reinforcing particles plays an important role [19,20]. The next significant aspect that has a considerable impact on the mechanical properties of the polymer composites is the interface between the two phases of the composite [21]. An interface is basically a two dimensional zone between two layers having different microstructure [22]. Some specific physical and chemical interactions that occur at the interface results  The chemical composition of the polymer matrix and the reinforcement, along with the filler geometry affect the overall chemistry and bring in significant changes in the interfacial zone and the global morphology [15,[28][29][30]. Toughening mechanism, cavitation (the extent of plastic yielding), fiber bridging, fatigue behavior and the shear strength of the overall composite largely get affected by the kind of interphase developed [31][32][33]. Therefore, it is important to possess a good understanding about the nature of interfacial region while designing composite materials in order to suit a specific application. The process used for manufacturing the composite material is also a governing factor in deciding the mechanical response along with the long term performance of the resulting composite.
Hand lay-up method, compression molding, resin transfer molding, injection molding, direct extrusion, pultrusion and vacuum molding are the processes generally used to manufacture composite materials.
The choice of manufacturing process depends on the intended application of the final product because each of the mentioned processes possesses different processing speed, extent of void formation and curing temperature [34][35][36]. Inappropriate choice of manufacturing technique may lead to very high shear stress at the boundary of the reinforcement. Hence these key parameters act as process variables and need to be optimized considering the final application [37]. Another influencing parameter on the mechanical behavior of composite materials is the residual stress and strain [38,39]. Stress transfer from the continuous phase to the dispersed phase is a very important phenomenon that critically affects the strength and stiffness of the composites [40]. The difference in the elastic modulus and the Poisson's ratio has a vital role to play in the mechanism of stress transfer [41]. Coefficient of thermal expansion of the matrix and the reinforcement is also needed to be taken into account since a mismatch in this coefficient may lead to the development of thermal residual stresses [42].
For developing composites of higher specific strength and stiffness along with higher impact, wear and fatigue resistance, it is important to consider a compound effect of the parameters mentioned above. Polymer composites thus have the advantage of synergistic properties which can easily be tailored for achieving a desirable specific set of properties by selecting the appropriate combination of continuous and dispersed phase [43]. Flexible and economical materials suiting a wide range of simultaneous objectives are what most modern industries demand. To explore the broader efficacy of polymer composites, it is important not to limit the composites to only one type of demand or objective by considering only a few process parameters while keeping others as constant. Rather all the process parameters are required to be considered simultaneously for optimization and that would be a remarkable contribution in the field of material design. Modeling the complex relationships between the governing parameters (both input and output) is extremely strenuous. Despite the availability of large experimental setups and computational tools, it is laborious and time-consuming to investigate the significance of each of the governing parameters experimentally. Over the last two decades, material science has undergone a steady shift from the phase of developing purely computational techniques for the discovery and design of new and complex materials to the phase of developing coupled methods that increase the reliability of results by making use of computational predictions and experimental validation. Finite element and molecular dynamics simulation methods have been used to model the material behavior in various fields, but the complexity and computational intensiveness of the methods have encouraged the research society to look for other alternatives [44][45][46][47][48][49]. Therefore, many researchers have relied on the machine learning approach to determine the significance of the process parameters for an optimal design [50,51]. Machine learning (ML) provides a wider scope for efficiently investigating the behavior of resulting composites with limited experimentation or computationally intensive realizations of expensive models (refer to Figure 2). Exploitation of ML allows to achieve Property prediction and optimization of polymer composites using convolutional neural network [52] (b) Predominant types of ML algorithms used in polymer composites (c) Application of machine learning for predicting the overall behavior of fiber-reinforced polymer composites [53].
(c) (b) (a) section 3: critical review of the sampling techniques used for ML model formation, section 4: brief descriptions and review of commonly adopted ML algorithms, section 5: different approaches for ensuring accuracy and quality of ML models, section 6: discussions on a few critical issues encountered in ML based analyses of polymer composites and an overview of future research directions, section 7: concluding remarks.

Multifaceted applications of machine learning in polymer composites
A brain-cell interaction model was created by Donald Hebb in 1949. He mentioned it in a book entitled, 'The organization of behavior' [54], wherein he presented the brain theory of neurons. The theory presented in this book is considered to be the first stepping-stone in the development of machine learning concepts. The first notion of Machine Learning (ML) came in 1959 through Arthur Samuael who defined it as a field of study that gives computers the ability to learn without being explicitly programmed. It drew attention of many researchers and they started investigating earnestly in this area.
After the 1990's ML caught its pace with the growing research and started being used in various fields like data analytics, predictive analysis, self-driving cars, fraud detection and prevention, stock exchange, text generation and analysis, image and face recognition and pattern recognition [55][56][57][58].
Gradually machine learning became a very exciting tool in the overall research community wherein various statistical and probabilistic methods were proven to accelerate the fundamental and applied research [59]. Machine learning techniques have a long history of application in the fields of biology and chemistry [60][61][62]. The huge success of ML algorithms in these fields encouraged material scientists to explore the possibility of utilizing the same in designing and developing new materials having superior properties and wider applications [63].
Conventionally experimentation used to play the key role in discovering and characterizing novel materials. But the advent of computational approaches has revolutionized the field of material science and the combination of experiments and simulations has proven to be an efficient way of exploring new possibilities [64,65]. In the analysis of polymer composites, a large number of existing parameters and various combinations which are needed to be evaluated in order to upsurge the applications with multi-functional demands, researchers are motivated to conduct more systematic and data-intensive research. The combination of experiments and computer simulations has been producing a huge expanse of data which has made it possible to integrate machine learning algorithms with material science. Many successful attempts can be found in the literature which are reviewed and summarized in this paper for bringing to light the wider acceptability along with promoting the usage of this reliable and very powerful tool further. Figure 3 is an illustration of the numerous applications  Here any machine learning algorithm can be adopted depending on the efficiency and suitability for a particular type of problem (refer to section 4).
of machine learning in polymer composites and material science, encompassing property prediction (/ characterization), novel material discovery and other evolving areas. In the following subsections, we briefly discuss the vast scope for applicability of ML along with the respective algorithms focusing on the literature concerning polymer composites.

Basic framework of machine learning paradigm in polymer composites
Supervised and unsupervised learning are two widely accepted machine learning paradigms which are used in material science. In supervised learning, a well labeled dataset is generated and input parameters are mapped with the known outputs based on which predictions for the new data are made [66,67]. While in unsupervised learning, problem in hand is approached without much information about the output. It allows the user to discover the patterns in the data by deriving a structure in the dataset which is based on clustering and associations rules [68,69]. In case of polymer composites, supervised learning is more commonly used. Figure 4 shows the basic workflow of applying supervised learning techniques to predict the material behavior.
Data preparation is the first most critical and imperative step in utilizing any of the ML techniques [70]. One subset of material science is material informatics which is focused on modifying the form of data so as to utilize the available information effectively and this is where most of the effort of ML scientists goes in. Data acquisition in a systematic way is very important so that all the relevant explanatory variables are considered. After collecting the appropriate data, preprocessing is done in Figure 5: Schematic representation of ML algorithms. The figure illustrates the basic scheme for initial implementation of any learning algorithm (h : x→y) where, h represents a hypothesis function that maps the input parameters (x) to the output (y) and selects a suitable learning algorithm to be used for further prediction.
terms of formatting, cleaning and then sampling it. Formatting helps in bringing a structure in the data which ultimately enhances the data quality. Some of the attributes are deleted in the cleaning step so as to keep only the relevant parameters and sampling is used to select a subset of the data out of a big chunk which can further be utilized for the training purpose in machine learning [71,72]. Converting the raw information into certain relevant attributes which are further used as input features for the selected algorithm is a necessary step for getting accurate predictions and is commonly known as feature engineering [73]. It helps in increasing the learning accuracy along with improved comprehensibility [74][75][76].
After preparing the data well, the next step is to set a hypothesis function (h(x)) which maps the input parameters (x) to the output (y) and select a suitable learning algorithm to be used. Based on what kind of data we have and whether the problem in hand is classification or regression, an appropriate machine learning algorithm is chosen [77]. For a classification problem, mostly used algorithms include K-Nearest Neighbor (KNN), decision trees, neural networks, naive bayes and support vector machine [78]. If the problem is regression, then algorithms like linear regression, support vector regression, neural networks, Gaussian process and ensemble methods are used [79].
The next step is to train the chosen model with the processed data. The available data is split into three subsections namely training, cross-validation and testing dataset. The model learns to process the information using a training dataset. Cross-validation dataset is used for parameter tuning and to avoid the problem of overfitting. Evaluating the model is an essential part of the model Figure 6: Domain of applicability of machine learning. Applications of machine learning are explained systematically in terms of three components, input, output and system. Machine learning can bring in the feasibility of analysis for all these problem types by effectively creating efficient predictive models of the actual systems.
development process. A model can have a very low error for the training data but still be inaccurate.
For this purpose, a test dataset is used to evaluate the performance of the model on the basis of which final predictions are made. This is how the selection of the final model is done and the hypothesis function is also evaluated. Figure 5 shows the basic model representation for the initial implementation of machine learning. Detailed overview of sampling, ML model formation and accuracy checking, as discussed above, are provided later in this paper (refer to sections 3 -5).

Prediction, optimization and uncertainty quantification
In polymer composites, machine learning has been majorly used for prediction of the material properties, process optimization, microstructural analysis, and quantification of uncertainties arising in the material and its properties due to the complex manufacturing processes. Figure 6 is a precise illustration of various situations in computational material science where ML can be applied. For example, consider a system of polymer composite beam subjected to load where the input parameters say the material properties of the beam are known, the load is known but the deflection caused due to the applied load is unknown. Here ML can be used to form an efficient predictive model mapping the deflection in terms of material properties and load, which can be used to obtain the deflection in a forward framework corresponding to any combination of values of the material properties and the load. The same ML model can also be utilized in an inverse framework for system identification, where the material properties can be obtained knowing the remaining two sets of information. In the following paragraphs, we provide a concise review of various such applications of machine learning in the context of polymer composites.
Pilania et al. [80] used density functional theory and machine learning approach to predict the atomization energy, lattice energy, dielectric constant and spring constant of various polymeric chains and found a good agreement of results in both the approaches. Daghigh et al. [81] used K -Nearest Neighbor, a machine learning algorithm to predict the heat deflection temperature of various biocomposites. Another research group [82]   progresses. Stochastic and deterministic are the two different approaches widely used in optimization algorithms. Deterministic algorithms make use of particular rules to find the solution and the uncertainties in terms of variable space are ignored [83,84]. While the stochastic algorithms are more like probabilistic methods wherein the uncertainties are modeled with suitable probability distributions [85]. A new approach, called robust optimization, is also used to explicitly model and minimize the uncertainty involved in the problem. It makes use of a set-based deterministic description of the uncertainties [86]. Robust optimization creates a mathematical framework for the optimization which For the formulation of optimization problems, stochastic algorithms use random objective functions and constraints. Optimal design is achieved by comparing different possible hypothesis functions and then estimating each of their corresponding cost function (squared error function) by identifying the design variables and constraints. The entire optimal problem is then expressed in a mathematical form and is solved using an optimization algorithm. Figure 8 gives the flowchart of the basic procedure followed for the design of an optimal problem. Figure 9 shows the classification of optimization algorithms based on the design variables, objective function and constraints. Optimization   A general flowchart is presented for using machine learning to solve optimization problems wherein more efficiency can be achieved and subsequently, the feasibility of exploring a design field can be substantially expanded for a better design outcome. be effectively focused on achieving the desired material properties. In material science, the quantification and representation of the microstructural design space are considered as one of the key research questions. Finding an appropriate descriptor set for the microstructural characterization of materials is of great importance. Xu et al. [101] proposed a ML based approach to identify the key microstructural descriptors and developed a four-step process to find the microstructural design variables considering the best descriptor among all. The quantity and morphology of the reinforcing fillers in the polymeric matrix affect the mechanical properties of the resulting composite to a large extent and hence makes microstructural optimization a significant research problem. A general flowchart of implementation of machine learning to solve optimization problems is given in Figure 10.
Computational analyses of polymer composites often encounter uncertainties because of the variations in the properties of the material, measurement uncertainty, limitations in the test set-up, operating environment and inaccurate geometrical features [102][103][104][105][106]. Uncertainty in parametric inputs, initial conditions and the boundary conditions, computational and numerical uncertainties arising from the unavoidable assumptions and approximations along with the inherent inaccuracy of the model result in major deviation from the deterministic values or the expected material behavior, altering the overall performance of composites. For ensuring that the simulation results are reliable and to understand the risks for making final product decisions, it is imperative to quantify these uncertainties. In the last few decades, researchers have made various attempts to quantify uncertainty [107][108][109][110][111], a brief review of which is given here. Bostanabad et al. [112] used the multi-response Gaussian process for uncertainty quantification in the simulation of woven fiber composites. They directly related the hyperparameters of the Gaussian process to the sources of physical uncertainty and reduced the overall computational cost. Doh et al. [113] used the approach of Bayesian inference to quantify the uncertainty of percolating electric conductance for the polymer nanocomposites reinforced with carbon nanotubes. They found that the correlation between the conductance of carbon nanotubes and the parameter of phase transition along with the critical exponent significantly affects the electrical conductance of the resulting composite in uncertainty quantification. ML models are trained on the available data and hence are prone to inherent uncertainties or errors associated with it. Jha et al. [114] considered the example of glass transition temperature to investigate the impact of uncertainties in the dataset on the predictions made by ML models. Using the Bayesian model they quantitatively represented the underlying uncertainties in the experimental values of the glass transition temperature.
The methodology of Polynomial Chaos Expansion (PCE) is found to be extensively used to quantify the unavoidable uncertainties that exist in the matrix and the fiber which consequently affect the material properties and their global responses [115][116][117]. This expansion is also known as Wiener Figure 11: ML assisted multi-scale uncertainty quantification of polymer composites. Flowchart for machine learning assisted uncertainty quantification and stochastic macro and micro-mechanical analysis of laminated composites [135], wherein representative figures are given for finite element analysis, sobol's quasi-random sampling, machine learning based surrogate model formation and uncertainty quantification (corresponding to the respective steps).
Chaos expansion and is an effective technique for solving stochastic systems. In its most basic form, it is a way of representing random variables/processes in terms of orthogonal polynomials. The core concept about PCE is that it allows the random variables of any arbitrary distribution to be represented about a random variable of our choice. Mathematically it is of the following form, where X is a random variable which is decomposed in terms of deterministic coefficients over an orthogonal polynomial basis ( ).

Reliability and sensitivity analysis
Polymer composites are being increasingly utilized by aerospace, automotive, marine and many other technologically advanced industries, where performance reliability is an important criteria for the product design. For sustainable development, robust design and safe operation of complex systems, it is essential to evaluate the reliability of these systems including identification of anomalies [120].
Production and development of large scale polymer composites involve utilization of dicey materials and manufacturing processes. This makes reliability analysis an integral part of material design.
Considering numerous structural and non-structural applications of polymer composites and the different types of loading that these composite products are subjected to during their service period, it is imperative to perform a reliability analysis for their safe industry applications. Lately, to evaluate the reliability of complex systems, ML based methodology has been used by researchers for addressing the variance-based sensitivity analyses at macro and micro levels in conjunction with machine learning for polymer composite laminates, wherein it was reported that fiber orientation angle is highly sensitive to the natural frequencies, followed by the material properties.

Sampling techniques for training machine learning models
To build an efficient machine learning model and analyze data, it is absolutely necessary to select the right subset of data based on which the 'learning' part will take place for a machine learning model. The selected subset should be optimally identified and perfectly representative of the entire analysis domain. It should not miss any of the important features that may govern the resulting behavior of the material. Such sampling techniques facilitate one with reliable information at a much lesser computational cost. For selecting the data subset from the distribution of interest, various sampling techniques are used to ensure that the selected subset is free from selection and measurement bias. The successful development of any ML model depends largely on the experimental design used for obtaining the learning data [136]. To estimate a function or a system that is computationally expensive, it is required to generate optimal sample points that can represent the entire analysis domain when integrated with an appropriate machine learning algorithm [137,138].
The sample points can be generated using various available sampling algorithms like Central Composite Design (CCD) [139,140], Latin hypercube design [141], uniform design [142], D-optimal design [143], Hammersley sequences [144] and orthogonal array sampling [145,146]. Alam et al. [147] compared a few of these sampling techniques in order to study the impact that experimental design has on the development of a neural network model. They found the best performance corresponding to the model developed using the Latin hypercube design technique. Central composite design lacks in its efficacy when it comes to a design problem having high-dimensional data [148]. In an attempt to achieve complete uniform curing along with reducing the residual stresses in the case of a fiber-  [140,154]. Later the same group extended their study concerning the prediction and identification problems relate to web-core composite panels to report that the quasi-random sobol sequence outperforms other sampling techniques like Halton, Latine hypercube and random uniform [155]. Using an appropriate and optimal sampling technique to build machine learning models often results in efficient predictions and lesser computational effort [154,[156][157][158].

Widely adopted machine learning algorithms in polymer composites
In the pursuit of making advancement in polymer composites by means of discovering novel materials, tailoring application-specific properties of these materials and evaluating the various affecting parameters so as to come up with an optimized design, researchers are continuously investigating the exploitation of the growing capabilities of machine learning algorithms. Such research activities have resulted in many successful attempts which are summarized in the following subsections considering the most widely adopted machine learning algorithms.

Neural networks
Neural network is an iterative approach that is the most preferred algorithm by many material science researchers to investigate various data-intensive aspects, which forms the basis of many key advancements in the field of artificial intelligence over the last few decades. It is a mathematical tool which is inspired by the biological nervous system and is used to solve a wide range of engineering and scientific problems by recognizing the underlying relationships in the available data [159]. In the human brain, there is a gazillion of neurons connected together within a network which helps in processing the flow of information to generate meaningful outputs. Similarly in neural networks, there are number of neurons that act as processors operating in parallel and arranged in different layers. The first layer receives all information (preprocessed data) to be considered and is known as the input layer.
Then there is an intermediate layer which is commonly called the hidden layer. It contains many discreet nodes and is responsible for all the computations [160]. The final layer, known as the output layer, provides the final results of the prediction. Figure 12 shows the basic architecture of Artificial Neural Network (ANN). There are estimation algorithms within the network that assign synaptic weights to the input parameters and then calculate the output. Robustness and efficiency of this method lie in its ability to handle a large amount of data with huge covariate spaces by making use of nonlinear mapping functions [161]. Multilayer Perceptron (MLP) and Radial Basis Function (RBF) are predictor functions which are frequently used in artificial neural networks [162]. These predictor functions help to minimize the error in the prediction of outputs. Feedforward architecture with backward propagation is usually followed for the output computation and error minimization. In feedforward architecture, no loops are formed in the entire network. Information in any of the units of the successive layers does not receive any feedback. But in case of back propagation, synaptic weights are adjusted by back propagating the error [163]. Weights are updated after each record is run through the network. One iteration is completed when all the records complete running through the network and it is technically known as epoch. The process is repeated after completing one epoch. There are mathematical equations (activation functions) for linking the weighted sums of each layer with the succeeding layer and deliver the output. The architecture of ANN model can be mathematically defined as where is the output vector, is the input vector, is the activation function (sigmoid, Tanh, softmax, softplus etc.), is the matrix that contains the synaptic weights, is the column vector of biases from the input layer to the hidden layer and is the column vector of biases from the hidden layer to the output layer.
ANN is one of the most used ML algorithms for the purpose of data classification, prediction, clustering, pattern and image recognition [164,165]. The successful applications of ANN in diverse fields include prediction of economic stability, remote sensing, weather forecasting, estimation of crop production and in the areas of medicine, environment, mining, materials and nanotechnology [166][167][168]. Owing to the efficacy of this function approximation method, it has been used to predict the mechanical properties of various composite materials with little experimental effort [169][170][171][172]. Matos  [183] correlated the parameters of induction welding process with the temperature varying in the Input layer

Hidden layer
Output layer

Input parameters
Output overlapping zone of laminates made up of carbon fiber reinforced polyetherketoneketone composites using neural networks. To train the network, data was obtained from two sets of experiments. One set of the experiments was performed using a vacuum bag and the other was done using a KVE tool (a thermoplastic assembly platform). Temperature sensors and fiber optic were installed in the overlapping region to record the variation in the temperature field when induction heating was done for a particular time period. This experimental data was then processed before feeding to the network so as to have temperature-time consistency. Two neural networks were used with the same architecture separately for the two datasets obtained from the above-mentioned sets of experiments. Accuracy of the used network was tested based on the difference between the experimental and the predicted value. The They came up with S-N curves and Constant Life Diagrams (CLD) by making use of only 50% experimental data. These diagrams were further utilized for designing purposes. Ramasamy et al. [190] used a feed forward ANN to predict the behavior of GFRP composites when subjected to drop impact test. Data fed to the ANN model was experimentally obtained by conducting a drop impact test and acoustic emission technique, wherein Purlin adaptive learning function along with the gradient descent algorithm was used.
Polynomial Neural Network (PNN) is a class of neural frameworks that has recently gained popularity in composites for the prediction of material behavior. Predictive modelling based on polynomial neural networks is found to be more accurate compared to other fuzzy models [191]. PNNs are most preferred when the output function is a higher order polynomial of the input function. Assaf et al. [192] modeled the fatigue behavior of glass fiber reinforced polymer composites using the feedforward recurrent neural networks and polynomial classifiers to compare the results of both the networks. Orientation angle of the fibers, R-ratio and ultimate stress were used as the input parameters in both the cases, where polynomial classifiers were found to be more accurate in predicting the fatigue life of these composites. It was noted that the predictions from both the approaches were comparable with the other existing fatigue life prediction techniques. Kumar et al. [193] used polynomial neural networks to model the buckling behavior of sandwich plates. The results obtained using the PNN model were found to be in line with the results of Monte Carlo simulation.
Another important class of neural networks is Convolutional Neural Network (CNN) which falls under the category of deep learning. Researchers these days are effectively making use of CNN for discovering new materials with optimal performance. It is basically an algorithm that has been successfully used for analyzing images and coming out with meaningful results [194][195][196]. Abueidda et al. [197] [198]. Different variants of neural networks have been successfully used by material scientists to predict their behavior under different conditions and some more of such related studies are presented in Table 1. In view of the effectiveness and several successful attempts, while it is worth considering the adoption of neural networks in the analyses of polymer composites, we present a critical review of other prominent machine learning techniques in the following subsections.

Linear and Logistic regression
Linear regression is a simple, old, and extensively used technique to make predictions when the target variable is continuous (for regression problems). It predicts the output variable based on a straight fit line and the prediction function is of the following form

( )
It considers that the input ( ) and output variables ( ) are linearly related to each other [260]. W represents the weight and represents the bias which is the intercept used to offset the predictions.
Weight and bias are the variables that the algorithm learns in order to make the most accurate predictions. It may be noted in this context that it is possible to bring in some degree of non-linearity between X and Y by appropriately using transformations (such as power, logarithmic, exponential etc.) of the input and output variable sets and thereby using linear regression as per equation (4)  probabilistic predictions are made by fitting a curve to the given data [261]. For this purpose, a sigmoid function of the following form is commonly used Considering the simplicity of this ML algorithm, it has been extensively used for prediction and classification in a multitude of problems [262][263][264]. Sakaguchi et al. [265] explained the correlation between the polymerization contraction force and the energy density with the help of linear regression.
Noryani et al. [266] used various statistical models for the material selection of natural fiber reinforced polymer composites. Based on the correlation coefficient, analysis of variance and the R-squared value, linear regression model was chosen for this purpose. Another group [267] developed a conductive composite material made of polyurethane filled with carbon black, wherein they studied the response of this composite material at a particular temperature and vapor pressure. A linear regression model was used to explain the relationship between these two parameters and the response rate of the composite.
Kleverlaan et al. [268] investigated the shrinkage stress developed in resin composites concerning the field of dentistry. They used the framework of linear regression to establish a correlation between the shrinkage and the contraction stress and, between the shrinkage stress and the tensile modulus. They found out that the amount of shrinkage/contraction stress and the tensile modulus developed in the material largely depends on the content of resin which is not polymerized. Gu et al. [269] applied logistic regression to predict the performance of the polymer composites having different combinations of geometry and material at a less computational cost. Xu et al. [270] made use of the logistic regression model to identify various damage modes in adhesive joints of fiber-reinforced polymer composites. They found that different damage modes viz. adhesive failure, delamination, fiber breakage, matrix/fiber separation from adhesive were all the subspace of the used acoustic emission parameters. The main application of logistic regression method has been found in classification problems e.g. for identifying the key parameters affecting the performance of a material using the given structural signals and for classification of defects and damage in composite materials [158,159]. Cao et al. [273] used the algorithm of logistic regression to assess the stability of different polymers based on the thermogravimetric analyses. Folorunso et al. [274] used the approach of logistic regression for the parametric analysis of electrical conductivity of different polymer composites. Polymer based materials are often used as dental filling materials. But during the initial hours of curing, a few substances release from the filling and it might have harmful effects. Berge et al. [275] applied the framework of logistic regression to investigate the effect of those substances on the health of a foetus. Osburg et al. [276] used logistic regression to classify the consumer interest for buying wood-based polymer composites.
Logistic regression has been successfully used to investigate the thermal response of polymer composites by multiple researchers [277,278].

Gaussian process (GP) and kriging
Gaussian process (GP) is a non-parametric stochastic process which is a powerful tool for solving non-linear problems [279]. To bridge the gap between computer simulations and physical conditions, Gaussian process is one of the Naïve Bayer's variants that is widely used [280]. This method is most commonly used in geostatistics which deals with modeling the spatial aspects of the world. GP has found a huge acceptance in various fields like oceanography, finance and physics [281][282][283][284][285]. With the increasing complexity of the datasets, simple parametric approaches tend to lack in accuracy and effectiveness. Implementation of neural networks also becomes tricky in such situations [286]. But the development of kernel methods like Support Vector Machines (SVM) and GP has made the inferences and predictions quite flexible and effective even if the data available for the training purpose is small. Gaussian process, in a nutshell, is a group of random variables with an assumption that all the dependent and independent variables have common Gaussian distribution [287]. A Gaussian process is characterized by two functions namely its mean, ( ), and the covariance, ( ), where i, j varies from 1 to n. The parameter n is the number of data points and x is the input vector.
Expression for a basic GP model is given by The final output is expressed as dataset [288]. Gaussian process involves boundless parameters that keep growing with the training data [289,290]. GP is mostly used to model the probability of such data in which the variables/features are of continuous nature [291].
Considering the efficacy of the method, many researchers have used GP for predicting material Another surrogate model based on the Gaussian process that is considered cost-effective and compact for complex computations is kriging. Kriging uses the idea of interpolation directed by prior covariances in order to obtain the responses corresponding to the unseen data. The unknown response in this method is represented as where ̂( ) is the unknown function of interest, is an n dimensional vector (n is number of design variables), ( ) is the known approximation function which is usually a polynomial and ( ) represents the realization of a stochastic process with zero mean and variance along with nonzero covariance. A hybrid approach of kriging and ML covariates has been used to make successful spatial interpolations in environmental sciences. Neural networks alone cannot have as accurate results as specific geostatistical tools, but using the outputs of neural networks as covariates in a kriging model produces highly accurate results for geospatial analyses [301]. Kriging has been successfully used to classify and cluster data with accuracy [302]. Considering the simplicity and reduced computational time of Gaussian process and kriging, in recent years this methodology has been used extensively to model the physical properties of different polymer composites with fairly good accuracy [303][304][305][306][307][308][309][310][311][312][313][314][315][316][317].
Lately, Mukhopadhyay et al. [106] investigated the performance of various kriging model variants (such as ordinary kriging, universal kriging based on pseudo-likelihood estimator, blind kriging, cokriging and universal kriging) to study the dynamic behavior of polymer composite laminates. It was revealed that universal kriging coupled with marginal likelihood estimate obtains the most accurate results, followed by co-kriging and blind kriging. According to computational efficiency, it was observed that for high-dimensional problems, the CPU time required for constructing co-kriging model is significantly less compared to the other kriging model variants.

Support vector machines (SVM)
Support Vector Machines (SVM) is an algorithm of supervised machine learning that can be used to solve classification and regression problems [318]. When it comes to solving the classification problems comprising of huge data, SVM has always been considered as the most classic technique under the broad domain of machine learning [319,320]. In SVM, each data point is plotted in xdimensional space, where x is the total number of features available. Here the goal is to find a hyperplane that classifies/splits all the training vectors in different classes. Hyperplane is always a (x -1) dimensional separator [321]. SVM is based on the idea of optimization, where the hyperplane is chosen in a way such that the distance between the support vectors (data points) and the data separator has the widest possible margin [322]. Support vector machines are versatile in their application as they make use of kernelization that allows the algorithm to model the non-linear decision boundaries by adding another dimension to the given data [323]. Abuomar et al. [330] used support vector machine to classify the mechanical properties of vinyl ester nanocomposites reinforced with vapor grown carbon fibers. The entire dataset was classified into ten classes and 3-folds cross validation was used. In order to assess the performance of SVM classifier, confusion matrices were used in different sets. This method was proven to be a very quick and reliable technique for classifying large datasets. Surface damage is unavoidable during the machining operation of polymer composites. Delamination is the most common form of damage that can be observed during drilling of fiber reinforced polymer composites and it becomes the primary reason for material failure as the overall strength is reduced. In this view, Aich et al. [331] made an attempt to use a machine learning algorithm to accurately predict the delamination factor. They exploited support vector machine to investigate the underlying effect of each input variable on overall responses of the material, which further helped in modeling the delamination factor. Xu et al. [332] used SVM in combination with algorithms of particle-swarm optimization to predict the degree of damage in cables made up of carbon fiber reinforced polymer composites. The particle swarm algorithm made use of Acoustic Emission (AE) signals and later trained the SVM with AE parameters. This paper demonstrated the feasibility of using AE with ML algorithm to judge the extent of damage in the composites. Another research group [333] performed nano indentation testing on CFRP composites to characterize their structural integrity.
They used SVM and K-nearest neighbors for identifying the reinforcement and assessing its quality. A correlation between the interphase properties of the composite and its reinforcement was also established. Altarazi et al. [334] fabricated polymer composites using two different techniques namely, compression moulding and extrusion blow moulding. They used nine ML algorithms to predict the flexure strength of the composites as a function of material composition and manufacturing conditions.
Out of these nine ML models, support vector machine showed the best results with maximum accuracy.
Dey et al. [335] investigated the effect of cutout on stochastic dynamics of composite laminates using a SVM assisted finite element approach, where a high level of computational efficiency was established.

Ensemble based methods
Ensemble methods are a combination of various base algorithms which result in a predictive model that outperforms each one of the individual models. Integrating various ML models in order to obtain an intelligent global predictive system results in reliable decisions with least generalization error [341]. The main idea is to weigh various individual models and accordingly combine them to produce the most optimal one [342]. The idea of this methodology under the domain of supervised learning came into existence in the late seventies and since then researchers started exploring it [343]. The first ensemble method was used by combining two linear regression models and nearest neighbor method to enhance the performance of recognition systems [344]. To determine the target variable, decision trees are very often used as ensemble method [345,346].
Multiple applications of such ensemble based ML methods can be traced in the area of polymer composites. Liu et al. [347] used an ensemble based methodology to predict fields of elastic deformation in the polymer composites. Different regression models were combined together to make accurate predictions. Dataset used for the training purpose was obtained from 3D microstructural images of the composites. In order to compare the advantages of additive manufacturing over the conventional method, Zhang et al. [348] used ensemble based machine learning model to predict the flexural strength of CFRP composites and compared the results with a linear regression and Physics based model. Ensemble based methodology achieved the best accuracy along with establishing an efficient relationship among the number of fiber layers, its orientation and matrix infill patterns.
Another research group [349] also adopted this methodology to analyze the thermographic data of CFRP composites for classifying the surface defects. Gaudenzi et al. [350] used an experimental procedure to classify the damage caused in CFRP composite laminates. They used low velocity impact to induce delamination in the composites and then applied wavelet packet transform to obtain damage features from the dynamic response of affected and non-affected specimens. Subsequently, those features were classified using ensemble based methods. In general, the ensemble based methods are extensively used in the area of structural health monitoring and damage assessment of composite structures [351,352]. Zang et al. [353] characterized natural fiber reinforced polymer composites for their surface finish using acoustic emission sensors. The complex heterogeneous structure of these composites results in adverse effects on the integrity of the machined surface. Therefore, to understand the time-frequency relationships of acoustic emission signals and their correlation with the material cutting mechanism, fiber orientation and speed of cutting during the machining process, ensemble based method was exploited. Pathan et al. [354] made use of Gradient Boosted tree Regressor (GBR) to predict the mechanical response of unidirectional fiber reinforced polymer composites based on their microstructural images. This GBR is an ensemble based technique whose performance depends on the number of estimators, learning rate and the depth of decision tree. Guo et al. [355] applied ensemble based learning to polyimide nanocomposites in the form of gradient boosting so as to predict their breakdown field strength. They prepared 32 composite samples varying in thickness and material composition. Voltage test equipment was used to measure the strength experimentally. Their prediction was reported to be accurate. From the concise literature survey presented in this subsection, it may be noted that the nature of considering a myriad of ML algorithms and then integrating these algorithms together to produce the optimal one is what makes ensemble techniques stand out [356].

Accuracy checking criteria
The most crucial step in developing any predictive model is evaluating its performance. It is very important to evaluate the prediction accuracy of the ML model for unseen data. Based on the problem in hand whether its regression or classification, different performance evaluating metrics are used. Confusion matrix, accuracy, precision, recall, receiver operating characteristics curve are a few metrics commonly used for assessing the performance of a classification problem [357,358]. Confusion matrix is a n × n matrix that represents the actual and predicted results of a classification problem, where n represents the number of classes taken into consideration [359].  the sensitivity and the specificity. Dwivedi [362] considered six different ML algorithms for the prediction of heart disease and compared their performance on the basis of ROC curve, where the logistic regression was found to have the highest classification accuracy. Probability Density Function (PDF) plots are also utilized to statistically define the likelihood of a particular result and the data distribution. PDF plots give a measure of the expected value (mean of the random variables) and the variance. Classification rate is another accuracy measure used for various classifiers when the problem in hand has multiple classes [363]. The classification rate is determined as On the other hand, for regression problems Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and coefficient of determination are frequently used for checking prediction accuracy.
These metrics give an idea of how close/far the predicted values are from the actual ones.
where n indicates the number of observations, A refers to the actual value and P refers to the corresponding predicted value. Coefficient of determination is based on the variance of the model considering the dependent and the independent variables. It is commonly denoted as R 2 and its value always lies between 0 and 1, where a higher value indicates better accuracy. For judging the best correlation between various parameters involved, scatter plots and parallel coordinate plots are also used. These plots are very helpful in characterizing the performance accuracy of different ML algorithms with a certain confidence interval. Li et al. [364] performed a comparative study to see the effectiveness of both these plots for determining the performance accuracy and scatter plots were found to outperform the other metric. Scatter plots are considered as one of the basic tools used for quality control where point-wise cross verification of the prediction is possible. N.K. Ostadi [365] used artificial neural networks, support vector machines and radial basis function to model the roughness index of pavement. He used scatter plots and sensitivity analysis to compare the qualitative performance of the mentioned algorithms. Also, the generalization accuracy of these algorithms was tested using the quantitative evaluation metric, the mean square error. Radial basis function was found to be the best algorithm based on the combined results of qualitative and quantitative performance measures. Anomalies in the form of outliers in the data affect the prediction accuracy of every ML algorithm [366]. Quick visualization of the available data is very important in applied machine learning to ensure a fair amount of accuracy in the predictions. To identify the specific patterns and outliers in the data, box plots are used to visualize the distribution of available numeric attributes through quartiles [367]. The originality of the data can be well maintained using the method of box-plots for the detection of outliers as this method does not require any pre-processing of the data [368]. To assess the prediction accuracy of ML algorithms, Bootstrap method, which is a resampling technique is also used.
This method involves the iterative resampling of a dataset with replacement. The advantage of using bootstrap method is that the resulting sample often has a Gaussian distribution and it is easy to determine the variation in the actual and predicted values in terms of standard error, variance and confidence interval [369]. In most cases of polymer composites, applying more than one of these metrics to check the performance of ML model is imperative for identifying anomalies and for making reliable predictions [111].

Summary and perspective
Machine learning uses a range of statistical, probabilistic and optimization techniques to comprehend a physical space and subsequently predict, analyze and identify. It offers the opportunity to study larger and complex polymer composite systems that are not currently amenable to characterization, design and optimization. This review paper is an attempt to benefit the material scientists and encourage them to exploit these competent algorithms in order to identify and develop new features and multi-functionalities which have been hitherto unexplored due to computational limitations. It is a perplexing task to understand the algorithms used for predictive modeling thoroughly. But on the basis of this extensive literature survey, a few critical remarks for different ML algorithms are given below.
 For the problems having too many input variables, which results in high dimensional input vector space, support vector machine works the best in most of the cases.
 For drawing inferences from noise inflicted data, decision tables and neural networks perform better for classification and regression problems respectively.


Logistic regression is one of the simplest algorithms for dealing with non-linear problems.
 Gaussian process has shown good results when the problem involves a combination of continuous and discrete variables.
 Polynomial chaos expansion is found to be very effective in quantifying the unavoidable uncertainties of polymer composites.


If there is a scope of designing the input samples, it is found that D-optimal design (DoE method), Sobol sequence and Latin hypercube sampling normally lead to faster convergence and outperform other sampling techniques.
 Superior performance of any ML algorithm over others is considered to be problem-specific.
No single ML algorithm is perfect for all the problems. Rather, the performance of any algorithm (accuracy, stability and computational efficiency) depends largely on the structure and size of the available data, dimension of the input parameter space, complexity of the model etc.
In recent years, machine learning has been adopted by numerous material scientists and engineers due to its striking success in different other fields. However, considering the widespread application of machine learning, it is important to address some existing problems that arise during the implementation of these algorithms to practical problems [370] with the complexity of modeling polymer composite materials, as highlighted in the following subsections.

Curse of dimensionality
As mentioned earlier, the success of any ML algorithm depends on the data which is fed into the network (i.e. the quality of learning). A decent predictive model always requires good and sufficient data to learn from. Usually, the data used for training purposes is either obtained from computational simulations or experiments. In order to produce viable results, data in sufficient quantity and in proper dimensionality is required [371]. Most of the ML algorithms work fine with the input feature vector of low dimensionality (provided that the input-output relationship is not too nonlinear or complex), but the same algorithms may lose their credibility when the dimensionality of the input space increases.
This can be attributed to the fact that with increasing dimensionality, data sparsity increases and the

Overfitting
Overfitting is a challenge often faced by data scientists when the data in hand and the available knowledge are not sufficient to determine the objective function correctly. In such cases, the model tends to make random predictions which are not coherent with the actual results [375]. This might result in causing illusions to the user that they have attained a good accuracy in the training phase but actually the predictions are so random that the accuracy of the testing phase may be as less as 40-50%.
It can be better understood by considering the bias-variance tradeoff. Bias is the tendency of the model to learn consistently on the basis of wrong assumptions and keep learning the same wrong data, while variance is the error generated due to the model's sensitivity towards small fluctuations. Variance makes the model learn random noise regardless of the correct signal which results in random outputs rather than the intended ones. Feeding the network with too many or too less attributes/features is the main reason that causes overfitting or underfitting and both of these result in poor predictions [376].
Cross-validation is one of the possible solutions to combat this problem. Another solution is to penalize the model by adding a regularization term in the function which will be used for the evaluation purpose [377]. Feature selection is one of the best solutions to avoid overfitting and increase the efficiency of any machine learning algorithm. Data to be used for the training purpose ensuring that only the relevant and useful features are used [378,379]. There are methods like information gain, chi-square and correlation coefficient which are used to measure the individual effect of a feature on the target variable and then accordingly include selected features for the training purpose [380][381][382]. These methods are known as filter methods. There are wrapper and embedded techniques that help in overcoming the problem of overfitting [383,384]. Wrapper methods involve techniques like feature elimination and genetic algorithms. To accsss the usefulness of the given features, multiple feature subsets are created and corresponding models are generated. These models are individually checked for their efficiency of prediction by analyzing the error and then the best model is selected. However, these techniques are computationally very expensive due to their iterative nature. Embedded methods like decision trees are helpful in combating the problem of overfitting and are also less expensive [385,386]. Therefore, using these techniques can help to increase the efficiency of the ML model by controlling the bias and variance.

Mixed variable problems
Many real life problems involve datasets of mixed variables, i.e. continuous and discrete both.
Discretization of data with continuous nature may result in considerable information loss. The quest of understanding the complete material behavior under specific conditions has resulted in many mixedvariable optimization problems. Specially with clustering algorithms where different distance metrics are used to draw patterns in the data based on similarity, mixed variables in the input space are challenging. To address this problem, a few algorithms like, heterogeneous Euclidean overlap metric, value difference metric and heterogeneous value difference metric are being used [387,388]. Designing the polymer composites involves consideration of both qualitative and quantitative aspects in terms of their composition and microstructure which often result in mixed variable problems [308]. For example, while analyzing laminated composite plates or shells, the number of plies is a discrete variable while other parameters like ply angle and material properties are continuous variables. Kim et al. [389] compared the performance of ANN and gaussian process in the presence of mixed variable input space and found that GP works better in such cases due to its interpolating nature. Parallel optimization algorithms are proven to be the most efficient ones in handling the nonconvex mixed variable problems [390]. Particle swarm optimization algorithms are also found to be effective in solving problems involving continuous and discrete variables simultaneously [391]. Adequate literature is not available on mixed variable problems concerning polymer composites despite its relevance to the highest order. Hence this area requires more attention in the near future involving optimum design under multi-objective demands.

Robustness under the influence of noise
Noise in the training dataset is an inevitable problem that impacts the prediction accuracy of the ML algorithms, eventually affecting the robustness of these methods. Robustness is a measure of the quality of ML algorithms which defines the performance of different methods [392]. Any algorithm is said to be robust if it has the capability of being insensitive to the inherent noise in the data [375,376].
Bayesian decision rules and relative loss of accuracy are two conventional metrics used for measuring the degree of robustness of an algorithm [395,396]. Saez et al. [394] proposed a new evaluation measure of robustness by combining the concept of performance and robustness, referred to as equalized loss of accuracy. Every real-world problem is associated with some noise due to measurement error, equipment error, calibration error, wrong assumptions, wrongly assigned classes/labels, modeling error and many other unavoidable errors [397]. In the case of polymer composites, data obtained from tomographic studies, X-ray projection, non-destructive testing, and acoustic emission is often noisy [156,[398][399][400]. Consequently, the pattern recognition and identification of defects in the composites become difficult and result in inaccurate predictive modeling. Many researchers have carried out various studies to understand and measure the effect of noise in the training datasets. Mukhopadhyay et al. [401] introduced artificial noise to three structures viz. a simply supported beam, a spring mass damper system and a fiber reinforced polymer composite bridge deck to evaluate the performance of central composite design and D-optimal design methods in conjunction with response surface modeling for the identification of structural damage. Both the methods are found to perform satisfactorily for all the three structures upto a noise level of 1.5%.
Subsequently, Mukhopadhyay et al. [402,403] quantified the effect of noise on machine learning based uncertainty quantification of polymer composites. Saseendran et al. [404] analyzed the impact of noise on two ML algorithms namely, polynomial and linear regression with ridge regularization. Both the algorithms suffered almost equal degradation in the performance accuracy with increasing percentage of noise in the data. Presence of noise in the dataset not only reduces the prediction and classification accuracy of a machine learning model, but also increases the learning time [405]. In order to identify noise in the data, ensemble, single learning and distance based techniques have been reported in the literature [406][407][408]. Ensemble based techniques have proven to be very effective in reducing the impact of noise on the damage identification of structures [409,410]. Since noise in a dataset is inevitable and the performance of ML algorithms depends significantly on quality of the dataset, it is important for the algorithm to be robust and stable enough even after some noise addition.

Latest trends and future road maps
The ever-growing capabilities of machine learning regularly attract the interest of researchers to exploit it to the maximum extent for solving the complexities of material science. The availability of a variety of large and complex data calls for affordable and more powerful computational processes. In this view, hybrid machine learning is a recent development with the promise of solving high dimensional problems with adequate intricacies. Hybrid machine learning works on the idea of combining multiple ML algorithms to increase the overall prediction capability by tuning mutually and generalizing or adapting to unseen data [411]. Ensemble based methods are an example of hybrid machine learning, which has been adopted by many in the areas of anomaly detection, speech recognition, uncertainty quantification and prediction of mechanical response concerning different types of composites [412][413][414][415]. A few successful attempts are reported in the area of polymer composites also. Mukhopadhyay et al. [416] reviewed the possibility of using hybrid machine learning by combining two ML algorithms for addressing the stochastic impact mechanics of polymer composites and obtained an enhanced level of computational efficiency as well as accuracy. In the hybrid ML algorithm, they showed that PCE can be used for global approximation, while the local fluctuations could be captured by kriging. Vu et al. [417] proposed a hybrid machine learning model to estimate the shear capacity of concrete strengthened with fiber-reinforced polymer composites. The hybrid ML model was based on the algorithms of support vector machine, least square and firefly algorithms. Results of this hybrid model were compared with the results obtained using artificial neural networks and the hybrid ML model showed 15% more accuracy. Hybrid machine learning has the potential to outperform individual ML methods in general [416].
Other significant recent advances in machine learning include adaptive learning. Traditional machine learning makes use of training and prediction as two main pipelines of every algorithm but adaptive learning uses a single pipeline and works on the basis of reinforcement learning. It observes and learns from the changes in the input and the output values along with considering their allied characteristics. Adaptive machine learning accepts the feedback from the working environment and then works accordingly to make improved predictions. The framework of machine learning has been found to be very promising for solving highly non-linear, dynamic systems even in the presence of uncertainties [76,418].
Multi-scale problems are very common in polymer composites as these composites are made of different phases (for details, refer to section 1). Therefore, a multi-scale analysis method is generally used to account for the size effect of the phases or the reinforcement added on the overall behavior of polymer composites. The approach of molecular dynamics has been used for nano to continuum scale bridging for developing efficient nano-composites [419,420]. But the molecular dynamics is considered computationally expensive and intractable [421]. Adaptive ML has been successfully used to build efficient scale-bridging models [422,423]. Recently seamless ML based algorithms have been presented for multi-scale optimization and uncertainty quantification of fiber-reinforced polymer composites [135]. It has been shown that a more elementary level analysis in polymer composites could lead to better insights and design of the global properties. An evolving research area with exceptional promise is the multi-fidelity modeling in machine learning. In this approach, the inexpensive low-fidelity input data and expensive high-fidelity input data can be optimally utilized for achieving a relatively high level of accuracy in efficient machine learning model formation [424,425].
Multi-scale systems like composites can be immensely benefitted from the multi-fidelity modeling approach in the computational investigations involving machine learning.
Recently the capability of feature selection in ML algorithms has been exploited to identify the hitherto undiscovered features of composites that could affect the global responses [354]. The ability of identifying sublime system features that are not obvious from our general physical understanding is particularly useful for complex material and microstructural systems like polymer composites. Even though the major application of ML in the field of polymer composites is still limited to the efficient prediction for carrying out a plethora of computationally intensive analyses, different aspects mentioned in this section (such as hybrid ML algorithms, adaptive and reinforcement learning, MLassisted scale-bridging, mixed variable problems, multi-fidelity modeling, feature identification etc.) would lead the research of this field in the foreseeable future. Research in the area of polymer composites is increasing exponentially due to the innumerable advantages that these composites possess, wherein the implementation of machine learning is still at a nascent stage, but with extraordinary potential and striking growth rate (refer to figure 13). The comprehensive review presented in this article will help the concerned researchers and practitioners to prioritize the field of required research in exploiting the fast-evolving capabilities of machine learning for better understanding and development of multi-functional polymer composites. [20] V. Kushvaha    [132] Y. Foo, C. Goh