The Use of the Principal Component Analysis for the Prediction of Superlattices Hard Perovskites and Inverse Perovskites

We present in this work an approach to predict which compounds among perovskites and inverse perovskites with the potential for achieving high hardness and fracture toughness for applications as a thermal barrier coating (TBC). We employ a throughout multivariate technique based on the principal component analysis (PCA). Among the 129 tested perovskites and inverse perovskites, only ~ 10 compounds may exhibit an interesting potential as thermal barrier coating. These results may serve as a map for the design of perovskite-related new multilayer ultra-hard coating materials.

Perovskites and inverse perovskites have a wide range of interesting physical and chemical properties including electro-optical effects, piezo-, ferro-and pyroelectricity, some of them possess also interesting, thermal properties, mechanical properties such as strength, stiffness, corrosion resistance, and high temperature corrosion superior to ordinary metals.
Thermal barrier coatings (TBC) are ceramic coatings deposited onto the surface of gas turbine blades used in the hottest section of gas turbine engines to protect the superalloy substrate from the hot-gas stream [8][9][10][11][12][13][14]. The upper layer is a ceramic topcoat with the prime function to provide heat insulation by its low thermal conductivity and it must withstand complex stress conditions [15][16][17][18][19][20]. Therefore, the selection of topcoat materials is constrained by some basic requirements: high temperature capability, low thermal conductivity, and superior fracture resistance (i.e. good damage tolerance) during thermal cycling under severe operating conditions. However, the current commercial TBC material YSZ (Y 2 O 3 partially stabilized ZrO 2 ) has an application temperature limit of 1200°C and therefore cannot be used for the next generation of gas turbine engines with higher operating temperatures. Therefore, the perovskites or inverse perovskites could be considered as promising alternative for extremely high temperature TBC applications, and consequently the e ciency, of turbine engines? Unfortunately, it is known that their room temperature brittleness and poor fracture toughness severely restrict their use. Therefore, the question which remains, how to overcome this constraint and increase the ductility and high fracture toughness in these compounds? One solution could be through nanolayered superlattice coatings.
Several studies reveal that in order to produce nanolayered superlattice coatings [21][22][23][24][25] with appreciable hardness enhancement; the layer materials should be chosen in such a way that they exhibit a large difference in shear modulus (ΔG) [26]. When the superlattice period is below a certain threshold, which is a function of the layer materials, a large ΔG will allow the layer interfaces to act as effective barriers to dislocation propagation from the softer layers to the harder layers under mechanical loading. We will adopt this approach in order to predict new superlattice perovskite-based TBC materials.
In the present paper, we primarily aim to understand the trend in the elastic and mechanical properties of some perovskites and inverse perovskites using data-mining techniques. We have noticed through the literature an important number of data have been produced for perovskites or inverse perovskites. However, not only is the creation of data whether through calculation or experiment important, but a way to analyze the data in a comprehensive and robust manner is also necessary. Some of the challenges in searching through discrete data include the di culty of analyzing large amounts of data, understanding the correlations among various properties, and using the correlations to better understand the underlying physics of the system. Utilizing a multivariate analysis, the data can be examined so that trends and correlations become apparent. A multiple selection criteria approach is proposed to screen the data rationality.
Additionally, the number of properties required to describe a potential TBC candidate may be reduced to a minimum number so that the problem of creating su cient amounts of data and analyzing this data is reduced. These multivariate methods can be classi ed as linear and nonlinear extraction techniques.
Among the linear technique, the Principal Component Analysis or PCA [27,28], it performs a linear mapping of the data to a lower-dimensional space in such a way that the variance of the data in the lowdimensional representation is maximized, which is more suitable for the analysis of materials properties and has been used to address a variety of physics and materials science issues [29,39]. On the other hand, t-Distributed Stochastic Neighbor Embedding (t-SNE) [40] is a non-linear technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets.
It is extensively applied in image processing, NLP, genomic data and speech processing [41]. Recently Park et al. [42], has used the t-SNE method in order to explore the formability of mixed-ion perovskites. Their results are promising and demonstrate that t-SNE is a robust nonlinear technique to reduce the dimensionality of the input variables space. In this work, we use the principal component analysis, since the principle components are an orthogonal basis sorted by amount of variance along the particular dimension. Once tted gives you a linear transformation for dimensionality reduction of further points not in the dataset being tted. The same cannot be said for T-SNE which directly minimizes distance between the dataset and its dimensionality reduction by gradient descent. This gives a correspondence for the known points, but not a function for new points so you'd have to do post-hoc interpolation or start from scratch [43].
For this purpose, we rst report the possibility to increase hardness of these compounds by superlattices process. We also critically predict their ductile superlattices properties.

Ii. Description Of Principal Component Analysis
PCA is a classi cation method which projects the spatial data into a set of principal components (PC) and maps the data on a dimensionally reduced space. The PC capturing the most information is associated with the eigenvalue corresponding with the largest eigenvalue of the covariance matrix of the original dataset. All PC's are orthogonal to each other, and thus each capture unique information. The advantage of PCA is that typically a few PC's are su cient for describing a system and a dataset of ndimensions can be reduced to a few dimensions with minimal loss of information. The PC's do not necessarily have an obvious physical meaning, but rather are a combination of variables which explain the largest variation in the data. The reduction in dimensionality makes trends and correlations which are "hidden" in the data to become easily visualized and described in PC space. PCA decomposes the original data matrix into the scores and loadings matrices, where the scores values classify the samples and the loadings values classify the descriptors in terms of their separation of the samples. The correlations among the descriptors become obvious in a PCA analysis, and by de ning the correlations in the data, we can then reduce the number of descriptors to a minimum to permit a more convenient data analysis.

Iii. Results
From a coating design perspective, knowledge discovery in databases can provide useful guidance for materials selection. In order to identify the trends or clustering in materials property data, we construct a database for 71 perovskites compounds (28 oxides, 7 chlorides, 8 bromides, 25 uorides and 3 iodides) and 15 descriptors, including the ionic radius (r A , r B , r X ), lattice parameters (a), elastic constants (C ij ), bulk modulus (B), shear modulus (G), hardness (H), Cauchy pressure (C p ), the Pugh modulus ratio (B/G) and the fracture toughness (K ic ). Table S1 contains the dataset used.
"t" is the tolerance factor as proposed by Goldsmith [44] is, given by: t= (r A +r X )/√ 2 (r B +r X ) (4) "µ" is the octahedral factor µ = r B /r X The Cauchy pressure is given by: The Where V 0 is the volume per atom (in m 3 ), B and G are in MPa.
PCA is used to assess the correlation between each of the descriptors input into the regression analyses and the stability of the compounds. The results of these analyses can then be compared with the predictive models to understand the physics and limitations of the models. The PCs do not necessarily have an obvious physical meaning, but rather are a combination of descriptors which explain the largest variation in the data. The advantage of PCA is that, since each PC uniquely captures the effect of a certain combination of relevant descriptors, typically a few PCs are su cient for describing a system.
The rst analysis done, was to examine if in our case the PCA captures the differences between the different perovskites? The resulting scores plot of this analysis is shown in Fig. 1a. For this analysis ( Fig. 1a), the sign of each principal component has only relational meaning. We notice that PC1(F1) captures 55.02% of the variance, whereas PC2(F2) captures 18.15%. The two PCs together capture ~ 74% of the variance of the data in Table S1. Therefore, a dataset of n-dimensions (15 initial descriptors in this case) can be reduced to a few dimensions (2 PCs) while capturing ~ 74% of the original information. The reduction in dimensionality makes trends and correlations, which are "hidden" in the data, become easily visualized and described in PC space as can be seen in Fig. 1a.
From looking at this gure it appears two important clustering those belonging to oxide perovskites and those to halide perovskites. Furthermore, within the oxide region, we observe a clear separation between the lanthanides and the transition metals. We notice that as PC1 increases the shear modulus (G) and toughness fracture (K ic ) increases, on the other hand as PC2 increases the B/G, and H increase (See table   S1). Therefore, a simple score plot could be a simple tool to identify the compounds with interesting mechanical and structural properties.
The loadings plot corresponds with the scores plot but represents the variance among descriptors. Figure 1b shows the loadings plot corresponding with the samples shown in Fig. 1a. The axes of the scores plot and loadings plot are the same so the information in the plots can be compared directly. The angles between the vectors tell us how characteristics correlate with one another. When two vectors are close, forming a small angle, the two variables they represent are positively correlated. If they meet each other at 90°, they are not likely to be correlated. When they diverge and form a large angle (close to 180°), they are negative correlated.
The impact of the descriptors is increased as its distance from the origin is increased. We notice from Fig. 1b two different clustering, those with negative PC1 (a, r A , r B , m, and r X ) and those with a positive PC1 (t, C 12 , B, C 44 , C 11 , G, H, C p , K ic ). Globally, we observe that "a" is inversely correlated to all the mechanical properties. We clearly observe that "a" and "B" are inversely correlated, that means, as "a" increases "B" decrease. It seems that B/G is not correlated to H (~ 90°). B/G is inversely correlated too to octahedral factor "µ". Therefore, the fact that perovskites with a low "µ" should have a large B/G and could be more ductile. Whereas, the tolerance factor "t" is correlated to "Cp", that means that the Cauchy pressure is highly sensitive to the crystalline structures of perovskites.
Since, the relative impact of each descriptor in a loading score is identi ed by measuring the absolute distance from the origin, we display below the different PC's equations as derived from the eigenvalue analysis: PC1= -0.717r A -0.657r B -0.688r X -0.857a + 0.458t -0.178µ + 0.905C 11 (9) For PC1 the coe cients (C 11 , C 12 , C 44 , B, G and K ic )) are the more important descriptors (~ 0,9), whereas for PC2 (µ, H, B/G) have the highest weighting (~ 0,7). These results con rm the observations noticed on the score plot of Fig. 1a.
Properties with similar PC values are highly correlated, while inverse PC values indicate inverse correlations. Globally, we observe that "a", and the ionic radius are inversely correlated to almost all the mechanical properties. On the other hand, we notice that C 11 , C 44 and K ic behave in the same manner (too close).
We have also performed PCA calculation for 58 inverse perovskites. The resulting scores plot of this analysis is shown in Fig. 2a, we notice that PC1(F1) captures 40.47% of the variance, whereas PC2(F2) captures 27.45%. The two PCs together capture ~ 68% of the variance of the data in Table S2. We notice three regions, the region "A" corresponds to the group of columns 2 of the periodic table (Ca 3 , Sr 3 , Ba 3 ), as PC1 decreases the ionic radius of X increases. Region "B" corresponds to column 3 (Sc 3 ), we observe also that as PC1 decreases the radius of ion A decreases (Tl, In, Ga, Al). Finally, region "C" corresponds to the other columns. On the other hand, we notice that as PC2's increases the G and H increase, whereas, as PC1's increases B/G and K ic increase. These behaviors are completely different than those observed for perovskites. Figure 2b display the loading results for the inverse perovskites. We clearly observe that "a" and "B" are inversely correlated as in the oxide perovskites. It seems that B/G is inversely correlated to H (~ 180°). Whereas, the tolerance factor "t" is correlated to "B". The PC's equations as derived from the eigenvalue  (~ 0,8). These results con rm the observations noticed on the score plot of Fig. 2a.
In this paper we are mainly interested to the ability of perovskites and inverse perovskites to deform (ductility) or to fracture (brittleness). It is well known that among those compounds some have superior mechanical properties; but almost are brittle. It is known that ductility occurs as atoms slide past one another in a bulk solid through dislocations.
There are two independent engineering elastic moduli: the shear (G) and the bulk (B) modulus. These quantities can be connected to single crystal elastic constants using different averaging techniques. The shear modulus encompasses is an indicator of the mechanical hardness H. Whereas, the bulk modulus represents a measure of the average bond strength of the atoms in the crystal, and it is proportional to the cohesive energy.
We present in Fig. 3 the variation of "B" versus "G" in order to reveal their ductility trend, as indicated by the ''ductility'' arrow. SrUO 3 , SrTiO 3 , SrVO 3 for oxides perovskites and NbCPt 3 , SnCPt 3 for inverse perovskites have a large "B" and "G", we notice, that this behavior is also clearly seen on the PCA results (Fig. 1a, Fig. 2a, the arrow). On the other hand, we display in Fig. 4 the Cauchy pressure "C 12 -C 44 " versus "B/G", since the ductility trend of certain cubic materials is based on the degree of the angular character of chemical bonding. As a general observation, ductile materials have positive values of Cauchy pressure, which correspond to more isotropic metallic bonding. On the other hand, brittle materials exhibit negative values of Cauchy pressure, which result from more angular character of the bonding. Whereas, the ratio "B/G" is considered as a parameter of ductility versus brittleness performance of solids. Ductility is characterized by a high "B/G" ratio (> 1.75), while low "B/G" is representative of brittleness. We observe that CaCrO 3 and SbNNi 3 , respectively for oxide perovskites and inverse perovskites, have a large "B/G" and "C p ". This is also clearly seen on the PCA results, since these compounds are isolated form all the other materials ( Fig. 1a and Fig. 2a). We also display on Fig. 5 the variation of hardness versus the toughness fracture. The compound which seems to present high hardness H and fracture toughness K ic ( Fig. 5) is SnPtC 3 and SrVO 3 for inverse perovskites and perovskites, respectively. However, even that CaCrO 3 and SbNNi 3 have high bulk modulus (Fig. 6), they have a poor hardness and fracture toughness.
Based on these results, we may conclude that from principal components analysis results, we may predict the mechanical properties of perovskites and inverse perovskites.

Based on these intrinsic properties of the compounds we introduce a new criterion "B/G & C p & H & K ic &
B" (where & is logical AND) in order to predict the more interesting compound which could be used as thermal barrier coating (TBC). We propose, any compound which satisfy these ve conditions jointly (C p >0, B/G > 1.75, H > 2, K ic >2 and B > 140) will de ne a minimal "green light" region for the potential TBC compounds. Thus, the perovskites and inverse perovskites which may be good candidates as TBC are displayed on table S3. We notice that the inverse perovskites seem to be a more reliable candidates than oxide perovskites.
The question, which remains, are we able to select from the results of the principal component analysis materials that can potentially meet the property requirements? One needs to concentrate on the region where the materials exhibit a combination of relatively high hardness and ductility. The high hardness and high fracture toughness correspond to the region "A" in the score plot, whereas, region "B" represents ductile compounds. The situation is inversed for inverse perovskites. Therefore, interesting materials could be at the frontier between these two regions.
In this article we have also been interested by the possibility to make arti cial materials with high hardness and toughness fracture through a thin coating approach. It consists of a method of obtaining high-hardness coatings in which a repeating layered structure of two materials with nanometer scale dimensions are deposited onto the surface. These structures are called "superlattices". As it has been studied by several authors [47][48][49], superlattices are characterized by the distance between each successive pair of layers "d", which is known as the "bilayer repeat period". Xi Chu et al [50] have demonstrated that the hardening effect of the interfaces is reduced when the layers are narrow. They explained the decrease in hardness at large "d" is due to the dislocations moving within individual layers since they are not able to cross the interfaces.
In this work we have used the approach introduced by Koehler [51], who suggested for the rst time in 1970 that a high-strength material could be obtained by fabricating a layered structure of two materials with the same crystal structure. Since, it is known that the interfaces between the layers could act as barriers to the motion of dislocations. Therefore, restricting the motion of dislocations will strengthen this type of material. So, if a dislocation moves into a layer with a higher shear modulus, the strain energy increases. Inducing a superlattice (A/B) a repulsive force that increases as dislocations in a layer "B" with a smaller modulus, G B , approach the interfaces with the layer "A" with a larger modulus, G A . According to Koehler's model, the critical stress required to move a dislocation across an abrupt interface is proportional to: Therefore, a superlattice in which the difference in modulus between the two layers "ΔG" is large will therefore have a large critical stress and so a large hardness enhancement.
Since perovskites or inverse perovskites-based superlattices remain almost unexplored, we may ask this question. Do the perovskites or inverse perovskites offer a better material combination for superlattice (SL) coatings?
The information presented in Figs. 3-5 are used to design superlattice hard coatings. As discussed, materials with a small "Q" should be considered for synthesizing superlattice coatings to achieve effective hardness enhancement. Applying this criterion to the calculated perovskites and inverse perovskites, we may anticipate from ( hardness enhancement due to their small ΔG. These perovskites and inverse perovskites are all cubic structure according to the values of their tolerance and octahedral factors (table S4). However, we observe a large lattice mismatch for CaZrO 3 /CaMoO 3 , NbCPt 3 /GaNNi 3 and NbCPt 3 /ZnNNi 3 , these may further enhance the hardness of these superlattices. These analyses are purely predictive and should be supported experimentally.
We can anticipate also from the PCA results, those potential materials for superlattice hard coatings. So, any combination of materials from region A as template and any materials from region B as substrate could give interesting superlattices e.g. (CaCrO 3 /SrVO 3 , SnCPt 3 /SbNNi 3 ). We notice from the different calculations of ΔG and the position of each compound in the PCA score plots, that the distance between two materials from different clusters are correlated to ΔG. Since as this distance increases, ΔG increases.
Therefore, the logic presented here can be applied to any system with any number of samples and descriptors. Combining informatics with calculated data and physical properties will allow for the greatest understanding of structure-property relationships. With that knowledge, materials can then be engineered to maximize the desired properties. The use of PCA here demonstrates how informatics can be used to screen information to determine what is necessary and useful, and then to use that knowledge in experimental, computational, and materials design.

Conclusion:
In this work we have analyzed perovskites and inverse perovskites compounds using a multivariate analysis. This work helps to develop a method for visually interpreting a PCA plot based on the correlation of the distances between the different perovskite compounds on the different plots. It has been clearly explained with respect to a logic focused on the correlation between the PCA results and the variation of the ΔG how to design better superlattice hard coatings materials. Thus, we expect that a simple visual observation of the PCA plots, in respect to the position of the any perovskite compounds in these plots will give us an insight on the variation of the ΔG.