Machine Learning the Metastable Phase Diagram of Materials

Phase diagrams are an invaluable tool for material synthesis and provide information on the phases of the material at any given thermodynamic condition. Conventional phase diagram generation involves experimentation to provide an initial estimate of thermodynamically accessible phases, followed by use of phenomenological models to interpolate between the available experimental data points and extrapolate to inaccessible regions. Such an approach, combined with first-principles calculations and data-mining techniques, has led to exhaustive thermodynamic databases albeit at distinct thermodynamic equilibria. In contrast, materials during their synthesis, operation, or processing, may not reach their thermodynamic equilibrium state but, instead, remain trapped in a local free energy minimum, that may exhibit desirable properties. Mapping these metastable phases and their thermodynamic behavior is highly desirable but currently lacking. Here, we introduce an automated workflow that integrates first principles physics and atomistic simulations with machine learning (ML), and high-performance computing to allow rapid exploration of the metastable phases of a given elemental composition. Using a representative material, carbon, with a vast number of metastable phases without parent in equilibrium, we demonstrate automatic mapping of hundreds of metastable states ranging from near equilibrium to those far-from-equilibrium. Moreover, we incorporate the free energy calculations into a neural-network-based learning of the equations of state that allows for construction of metastable phase diagrams. High temperature high pressure experiments using a diamond anvil cell on graphite sample coupled with high-resolution transmission electron microscopy are used to validate our metastable phase predictions. Our introduced approach is general and broadly applicable to single and multi-component systems.


Introduction
Materials synthesis has traditionally relied on "thermodynamic phase diagrams" to provide information about the stable phases as a function of various intensive state properties such as temperature, pressure, and chemical composition. The conventional method for generating a phase diagram involves experimentation to provide an initial estimate of phase boundaries followed by the use of phenomenological models to interpolate the available experimental data points and extrapolate to experimentally inaccessible regions. Such an approach combined with atomistic simulations and recent data-mining techniques has led to well-established exhaustive thermodynamic databases [1,55,62] for different materials-albeit limited to phases observed near thermodynamic equilibria. However, following material synthesis and processing, or during operation, most materials may be trapped in local minima of the energy landscape, that is, in metastable states (see Figure  1(a)). Solid carbon is a prototypical system exhibiting such behavior, with large number of known metastable allotropes at room temperature and atmospheric pressure. Importantly, these allotropes have contrasting properties ranging from metals [58,27,39,42], semiconductors [64], topological insulators [50,11,37], and wide band gap insulators [67]. There is likely a vast and rich phase space of metastable structures for multicomponent materials, with some exhibiting exotic and potentially desirable properties. The demand for such materials motivates the move beyond the traditionally explored area of near-equilibrium materials. Towards this goal, the generation of exhaustive datasets of "metastable phase diagrams", mapping the equation of states for phases without parent in thermodynamic equilibrium, is highly desirable but has remained elusive.
Creating a phase map for metastable materials is a non-trivial and data-intensive task. The first challenge is to employ an efficient structure optimization algorithm capable of identifying both global (ground state) and local (metastable) minima of the energy landscapes in the configurational space. The next challenge is to map the free energy surface (i.e. the equation of state) for each of these metastable phases as a function of the intensive thermodynamic state variables (P , T and X), over the range in which the phase information is desired. As this information is usually discretized into a finite grid of state variables, one needs to find the free energies(G(T, P, X)) of individual metastable phases at each grid point. This step quickly becomes computationally prohibitive for large numbers of metastable configurations, and, in practice, requires a surrogate model, to approximate the free energy calculations of more expensive first-principles based approach (e.g. ab-initio molecular dynamics). The final, major challenge after the equation of state for all the phases are computed is to classify and identify the phase boundaries at varying degrees of nonequilibrium i.e. the areas of the phase diagram in which a metastable structure is dynamically decoupled from lower energy structures.
Here, we report an automated framework that addresses the above challenges by integrating a genetic algorithm with first-principles calculations, classical molecular dynamics simulations, machine learning (ML), and high-performance computing to allow the generation and exploration of the metastable materials. Our framework allows the automatic discovery, identification, and exploration of the metastable phases of a material, and 'learns' their equations of state through a deep neural network. To test the efficacy of our framework, we use the representative and highly-significant example of carbon -a system well-known to exhibit a large number of metastable allotropes -and map its metastable phase diagram in a large range of temperatures (0-3000 K), pressure (0-100 GPa) and excess free energy (up to 500 meV/atom above thermodynamic equilibrium). Importantly, we show that the proximal phases to thermodynamic equilibrium (within 50 meV/atom) can be observed experimentally in high pressure high temperature (HPHT) processing of graphite. In particular, we identify a new cubic-diaphite metastable configuration that explains the diffraction pattern of the previously reported n-diamond [25], demonstrating the potential of our approach to guide the synthesis of materials beyond equilibrium.

Method
Our workflow is summarized in Figure 1. We construct metastable phase diagrams with the chemical information of the periodic system as input, along with the range of pressure and temperature of interest.
The ground and metastable states at a given set of thermodynamic conditions (T, P ) correspond to global and local minima of the free energy in the configurational space, G({r i }, a, b, c, α, β, γ), where a, b, c, α, β, γ are the lattice parameters and {r i } are the position of the basis atoms. As explained in details below, we identify the metastable phases by sampling the energy landscapes at fixed thermodynamic conditions. We then compute, at the identified minima, the Gibbs free energy in the thermodynamic space as function of intensive variables G(T, P ); the free energy and the relative energetic ordering of its minima varies with (T, P ) as illustrated in Figure 1 (a),(b). Upon identification, the free energy and stability of these phases at (T, P ) is represented as a graph (Figure 1(c)) with nodes corresponding to the free energy of the phases and the edges to the free energy barrier connecting them. This discrete thermodynamics representation is made Figure 1: (a) and (b) Schematic illustration of the free energy landscape in the configurational space at different conditions (T 1 , P 1 ), (T 2 , P 2 ). The phases corresponding to the minima are labeled χ, ψ, ω. GS, M1 and M2 stand for ground state, near-equilibrium and far-from equilibrium metastable phases; (c) Graph representation of the energy landscape. Nodes correspond to the phases and the edges contain the barrier height; (d) Equation of state for χ, ψ and ω (e) Illustration of the metastable phase diagram as a function of ∆G; (f) Our workflow to identify metastable configurations and construct the metastable phase diagram continuous as a function of (T, P ), and the crossing points in equation of states automatically identified. Finally, we generate the full metastable phase diagram, P (T, P, ∆G), where P is the most energetic phase within a free energy ∆G of the ground state at a given (T, P, ∆G).

Evolutionary structure prediction
The first step in our workflow is to identify the periodic structures that are energetically favorable for a given chemical composition. We use an evolutionary search based on genetic algorithm -known to be efficient for periodic systems [44,15,49]. Briefly, evolutionary algorithms aspire to optimize the atomic arrangement r 1 ,r 2 ,...r n and the lattice parameters (a, b, c, α, β, γ) of a population of structures over different regions of the energy landscape through genetic variations and selections over successive iterations. The structure corresponding to the global minima is the ground state equilibrium structure. Conversely, the metastable phases at a finite temperature and pressure correspond to the local minima of the Gibbs free energy landscape G({r i }, a, b, c, α, β, γ). Hence, evolutionary algorithms are naturally suited to locate candidate metastable phases over the configurational space by evolving a pool of structures at the same time. Although G includes both the temperature(-T S) and pressure (P V ) contributions, for computational cost efficiency, we only include the effect of finite pressure in the selection of the offspring structures, by optimizing enthalpy at 0 K and fixed pressure, H(T = 0 K, P ). The entropic contribution will be computed in the subsequent steps of our workflow.
We perform evolutionary structure search at several different pressures (P = 0 GP a, P = 10 GP a & P = 100 GP a) independently by minimizing H(T = 0 K, P ). The search is initiated with a population of N = 40 randomly generated crystal structure satisfying the following geometric constrains: (i) no two atoms are closer than 0.5Å, (ii) total number of atoms in the unit cells lies between 4 and 20, (iii) length of the lattice vectors (a, b, c) lie between 2Å and 20Å (iv) lattice angles (α, β, γ) lie between 20 • and 160 • . Each member of the population is locally relaxed using density functional theory with Perdew, Burke, Ernzerhof approximation before computing its enthalpy. Including such chemically informed local optimization accelerates the search for minima and avoids sampling unphysical configurations. In addition, we also perform independent evolutionary structure searches using the long-range carbon bond-order potential (LCBOP) [21] model. Classical models like LCBOP are cheaper compared to DFT and allows for a quick search over the vast configurational space.
During the search, "parent" structures are selected from the current population with probabilities proportional to their fitness, computed based on H(T = 0 K, P ) (see supporting information S1). The phase with the lowest(highest) enthalpy has the highest(lowest) fitness. Genetic operations are performed on the parents to produce new "offspring" crystal structures. Subsequently, a new generation is constructed from N best structures from the previous generation and the new offspring structures. The above cycle is repeated until the enthalpy of the top N/8 structures are within the desired tolerance. Further details on the structure search algorithm such as the selection criteria, types of genetic operations and parameters used, are provided in the supporting information (section S1). All the new phases encountered during the search and their corresponding enthalpy values are recorded. Candidate phases for free energy calculations are identified from a collated list of structures from several independent evolutionary structure search at different pressures.

Metastable phase identification
While any point (T, P ) in the equilibrium phase diagram shows the ground state, there exists several local minima (metastable phases) in the configurational space ({r i }, a, b, c, α, β, γ) separated by a finite free energy difference ∆G (Figure 1(c),(d)). Figure 2(a) depicts some of representative metastable structures found during structure search. Cubic diamond and hexagonal graphite appear in the experimental equilibrium phase diagram [7] of carbon. Apart from the equilibrium phases, our evolutionary structure search algorithm also identified metastable structures like the hexagonal diamond, 4H phase, other stacking combinations of cubic and hexagonal diamond (stacking disorder), distorted cubic diamond, distorted hexagonal diamond (diaphite), which are also observed in our HPHT experiments (see below). At a given pressure, the structure with minimum enthalpy (H ground ) is the ground state at 0 K -in the case of carbon, graphite at 0 GPa. At 0 K, we have H(T = 0 K, P ) = G(T = 0 K, P ). Hence, we can use a cutoff criteria based on H(T = 0 K, P ) to screen the candidate metastable phases for the subsequent free energy calculation. We define a ∆H cut−off and neglect the structures whose enthalpy is more than ∆H cut−off from the ground state. Upon convergence of the evolutionary structure search, we only select the structures that satisfy H < H ground +∆H cut−off for further analysis. In the present work, we set ∆H cut−off = 670 meV /atom, comparable to the excess enthalpy of C60 fullerene (∆H C60 = 608 meV /atom) that, we hypothesize, should be large enough to include the thermodynamically relevant metastable structures. Among the selected structures, we group geometrically similar and layered structures (for example hexagonal graphite, orthorhombic graphite, rhombohedral graphite) based on the radial distribution function, angular distribution function (see supporting information, section S1.2) which further reduces the number of candidate structures for free energy calculation. After performing the above selection and grouping of structures, we narrow down to 505 candidate metastable structures for free energy calculation.

Free energy calculation and discrete phase diagram
The candidate structures obtained after performing the previous steps are entirely based on the enthalpy values at 0 K. However, the metastability of a structure at a finite temperature is determined based on the Gibbs free energy G(T, P ). The continuous axis of temperature and pressure in the phase diagram are discretized into 2D grid (in this case 16x16) within the range of interest as shown in Figure 2(b) to reduce the number of free energy calculations. The Gibbs free energy of each candidate screened from the previous step is computed at each of the grid points. At a given temperature and pressure:

Metastable Phase Diagrams
(1) In solids with few atomic components, the vibrational contribution to the entropy is the dominant one [61], and hence we make the approximation: If the atomic vibrations are modeled as harmonic oscillators, it follows that The enthalpy of the system, H(T, P ), is estimated at each of the grid points from MD simulations by equilibrating under NPT ensemble, while the entropic part of the free energy is computed from the phonon spectra using Eq. (3). The phonon spectra and the vibrational free energies are calculated at the equilibrium density obtained from MD simulations at the corresponding T and P . Further details on the MD simulations and the vibrational free energy calculations can be found in supporting information section S1.3. The MD simulations are performed using LAMMPS package [46] and the phonon spectra is calculated using the PHONOPY package [59].
Once we have the free energies of all the structures, the discretized phase diagram is constructed by comparing the G(T, P ) of the candidate structures at each point in the 2D grid.

Phase-dependent equations of state through Deep Neural Networks
As shown below, recently developed machine learning (ML) methods [35,48,41] for developing inter-atomic potentials [4,5,28,47] or estimating atomistic or molecular properties [51,70,12] can be used to compute Gibbs free energy as a continuous function of T, P . In particular, the equation of state of a phase can be predicted directly given only the 0 K structural information of a phase, allowing us to quickly estimate a (T, P ) region wherein a specific phase has low Gibbs free energy, and can be potentially realized in the experiments.
We use a deep neural network (DNN) that takes as an input the many-body tensor representation [29] (MBTR) of a phase, along with T and P information. The DNN is trained on the Gibbs free energy data of 248 phases out of the 505 carbon phases. Regularization techniques, such as dropout and early stopping, were utilized to avoid overfitting. Some important low energy metastable phases, namely, hexagonal diamond, defective cubic diamond (S228) and 4H (S20), were intentionally left out from the training process and were used to evaluate the DNN performance. More details on the DNN architecture, training, and the MBTR descriptor are provided in the supporting information section S3.

Phase boundary classification -Discretized to continuous phase diagram
The final stage in our workflow is to clearly identify the phase boundaries as a function of (T, P, ∆G) separating the different phases. Machine learning algorithms like support vector machines (SVMs) [13,10] which can draw the decision boundaries between different classes of inputs are well suited to automate the estimation of such phase boundaries. SVMs are binary classifiers by definition and one have to resort to decomposition techniques like "one-vs-all" or "one-vs-rest" [6] which involves training many classifiers and taking the weighted value of all the output. While such decomposition techniques have been successfully used in the past, it is computationally demanding to train multiple classifiers when there are a large of number phases. Instead, we use a purely multiclass SVM [65,14,36,22,31] (MSVM), using a non-homogenous 4th order polynomial kernel, which can classify multiple classes without relying on decomposition techniques. Training only one MSVM classifier reduces the computational time tremendously while maintaining the accuracy of the classifiers. The final equilibrium and metastable phase diagram can be generated with the decision boundaries drawn using MSVM ( Figure 2).

Equilibrium phase diagram
We first validate our workflow by constructing equilibrium phase diagram and comparing against the experimental graphite-diamond phase boundary [7,21]. The discretized equilibrium phase diagram (Figure 2(b)) is constructed by comparing the G(T, P ) of diamond and graphite, and plotting the phase with a lower G at each grid point. The color of the points correspond to the color of the structures shown in Figure 2(a). As expected, from the experimental phase diagram [7,53], the cubic diamond phase is dominant at high pressure whereas graphite is more stable in the low-pressure region. Importantly, our predicted diamond-graphite phase boundary matches well with the experimental phase boundary (dashed line in Figure 2(c)).
We also note that G(T, P ) for S132 and S353 are very close to diamond at moderate pressures and slightly lower (∆G/k B T < 0.3) than diamond and graphite at high temperatures (see supporting information S2). These phases correspond to the stacking disorder phase (orange in Figure 2(a)) consisting of alternating layers of cubic diamond and hexagonal diamond and a diaphitine like distorted hexagonal diamond (purple in Figure 2(a)) with two different bond lengths at 1.47Å and 1.53Å. While both the stacking disorder and the diaphitine-like lonsdaleite phases are widely believed to be metastable [17,52,9,43], our calculations show that they lie near the experimental phase boundary. Incidentally, our theoretically predicted ranges of stability for stacking disorder also matches with the experimental conditions under which they are observed [8,23,30,16,34]. For instance, hexagonal diamond (lonsdaleite) containing varying fraction of cubic diamond [52], alternatively described as a stacking disorder diamond or faulted and twinned cubic diamond [43] has been experimentally synthesized under static compression [8,60,23,30,45], HPHTtreatment [33,71] or shock compression [71,17,52,9]. These observations are not surprising considering that, near the phase boundaries, the energetic differences between the experimentally reported metastable phases and the stable (cubic diamond, graphite) phases is only 0.3 × k B T or less (see supporting information). Such a small difference increases the likelihood (discussed below) of forming these phases at high temperatures.

Metastable phase diagram
We next construct the metastable phase diagram of carbon. While the phases represented in the equilibrium phase diagram exhibit minimum free energy at a given pressure and temperature, metastable phases are located in valleys of the high dimensional free energy landscape with respect to the structural parameters (refer schematic Figure 1 . We therefore construct a ∆G(T, P ) surface, the projections of which can be used to derive the metastable phase diagram as a function of the degree of non-equilibrium from the corresponding equilibrium phase. We thus define the metastable phase diagram as the phase diagram obtained by projecting on the T − P plane, the phase with closest ∆G

M Sj
GSi (T, P ) value compared to a given ∆G, which is also the measure of degree of non-equilibrium, and satisfies ∆G M Sj GSi (T, P ) < ∆G. In other words, by varying ∆G, we are effectively taking slices of the overlaid free energy landscape (Figure 1(e) & Figure 4(a)) of all the structures. Experimentally, such phases can be accessed by using pulsed laser heating, in which the system undergoes phase transformation with the pulse providing the energy to transition between local minima of the free energy (Figure 1(c)). Figure 2(d) and 2(e) shows the metastable phase diagram of carbon at ∆G equal to 20 meV/atom and 45 meV/atom respectively. At a non-zero ∆G, we see the appearance of new metastable phases in the phase diagram. At much higher values of ∆G more than 20 metastable phases appear in the metastable phase diagram (section S4 in supporting information). Representing the metastability of the different phases in such a phase diagram offers a wealth of information. One can deduce the temperature-pressure ranges at which a phase is likely to be stabilized and an estimate of minimum excitation energies (from ∆G) required to synthesize a metastable phase, thus offering a systematic approach in designing experiments at favorable conditions for synthesis. It is interesting to note that, the neighboring phases in the metastable phase diagram are structurally similar suggesting, and there may exist, a low energy transition pathway connecting them. For example, the diaphite phase, is adjacent to the regular hexagonal diamond structure in the phase diagram at a ∆G= 20 meV/atom. We use the information derived from the metastable phase diagram to explain the experimental observations during laser heating induced phase transformation of hexagonal graphite in a pressurized diamond anvil cell (DAC). We perform HPHT experiment by loading a 60×20 µm single crystal graphite disk into a DAC. The pressure was first raised to 20 GPa monitored by ruby fluorescence. The graphite crystal was heated to ≈1400 K by a YAG laser at the center of crystal. Due to Gaussian distribution of laser spot, the temperature away from the center could be as low as ≈1000K, such that the center part of the sample was turned into dark transparent and the outside rim remains as dark. In this recovered sample, several metastable phases were identified by HRTEM as shown by the images in Figure 3. When pressurized, the graphite layers slide with respect to each other to form orthorhombic and rhombohedral graphite (Figure 3(a)) [54,32,40]. Around the dark rim near transparent areas, with further increase in temperature, the orthorhombic and rhombohedral graphite layers buckle to form interlayer bonds resulting in the formation of hexagonal or cubic diamond respectively [54,32,56,18,19,57,72,63,68]. In practice, both the transformation pathways occur simultaneously, resulting in an intergrowth of cubic and hexagonal diamond [20,24,43,63], also known as the stacking disorder (shown in Figure 3(c)). The diaphite-like lonsdaelite phase with two different bond lengths (Figure 3(b), supporting information section S8) was also observed after the HPHT treatment. One can explain this observation with the aid of our metastable phase diagram. The diaphite phase is easily accessible under the experimental conditions used (20 GPa, 1400 K) since it is metastable with only a ∆G=45 meV/atom (Figure 2(d); purple phase) which is ≈ k B T /3. We conjecture that graphite undergoes phase transformation, triggered by the excitation in experiments, into a accessible metastable phase which can be represented as excitation induced hopping from the global minima to a local minimum in the free energy landscape.

High Temperature High Pressure Experiments
Furthermore, we observe a new cubic-diamond like phase exhibiting the same diffraction pattern as the previously reported n-diamond [25]. New diamond (n-diamond) was proposed as a new carbon allotrope; its electron diffraction pattern matches that of cubic (Fd-3m) diamond apart from some additional reflections that are forbidden for diamond, indexed as {200}, {222} and {420}. The speculation of this new allotrope was first reported in 1991 [25], but the exact crystal structure of n-diamond has remained as a controversy despite several attempts to explain the n-diamond diffraction pattern [66,26,3,38]. Here, we attempt to explain the crystal structure of the metastable n-diamond using our metastable phase diagrams. Among all the phases that appear near the experimental conditions (≈20 GPa, 1400K) in the metastable phase diagram at ∆G = 45 meV/atom (Figure 2(e)), the diffraction pattern of S291 phase matches excellently with experiments (Figure 3(b)). The S291 phase is a cubic analog of the diaphite-like lonsdaelite phase with two different bond lengths (section S6 in supporting information). Similar to the diaphite-like lonsdaelite phase, cubic-diaphite is dynamically stable and has no imaginary phonon modes under a highly anisotropic pressure (section S6 in supporting information). Such anisotropy in pressure is present in our experiments. In the dark area where graphite was not converted into diamond, we found graphite layers are severely bent and formed into many empty pockets with a rhombus shape. Under an anisotropic pressure, the atomic plane distance becomes much shorter at these bent areas, equivalent to a huge increase in pressure in the out-of-plane direction. It is predicted that diamond nucleates at these bent areas [69].
Hence, our framework not only correctly reproduces the dominant diamond and graphite phase in the equilibrium phase diagram, but also explains the observation of metastable phases in HPHT experiments. Mapping the metastable phase diagram and inspecting the neighboring phases provides insight into possible phase transformations pathways and assists in selecting the appropriate starting material for targeted synthesis, thus accelerating computer-aided materials discovery.

Domains of relative stability
The phase diagrams discussed above were generated by comparing the free energies of all the candidate phases. Often, materials scientists find it useful to consider only a select few phases of interest and inspect their relative probability of formation. For example, one may consider only two phases involved in a phase transition and study their relative stability, to estimate the phase transition line. The probability of observing a phase at a given pressure and temperature depends on its relative stability with respect to the competing phases. Figure 4(a) shows the free energy profile, at P = 12.5 GPa, of the phases that appear in equilibrium phase diagram and the near equilibrium metastable phases S20, S28, S32, S50, S132, S353, S291 and S228. The points where any two pair of lines intersect is the phase boundary between the corresponding phases. Free energies of distinct phases are separated by a finite ∆G (also the degree of non-equilibrium). The relative stability can also be considered as the projection, on the T − P plane, of the distances between the free energy surfaces G(T, P ) for each phase. Figure 4(b) shows the map of the difference in the free energies ∆G = G hex−diamond − G diaphite . Experimentally, diaphite is observed at moderate pressures and high temperatures whereas high pressure conditions predominantly yield hexagonal diamond. Such information about the relative stability can aid in driving the synthesis process to yield a desired metastable phase, as opposed to a mixture of phases, by appropriately tuning the experimental conditions.

Domains of synthesizability
The possibility of observing a phase at a given T and P depends on whether the crystal structure is retained or deformed due to melting or dynamical instability. In other words, the synthesizability is fundamentally limited by dynamical stability. We analyze the dynamical stability of the metastable phases using the mean square deviation (MSD) of the atoms during the MD simulations. A phase is dynamically unstable if the MSD is greater than 0.1Å. Here, we define domain of synthesizability as the region in the (T, P ) space where a phase is dynamically stable. Figure 5 shows the domains of synthesizability of S32, S81 and S30. While the synthesizability of phase S32 and S125 is pressure limited, S81 is temperature limited. It should be noted that staying within the domain of synthesizability is a necessary, but not a sufficient condition for successful synthesis as there may be other factors limiting the synthesis. Similar upper limits for synthesizability, but based on the energetics of the amorphous phase, has been proposed in the past [2]. When a metastable phase is driven into a region of dynamically instability, it may transform into a neighboring metastable phase in the energy landscape or undergo melting to form an amorphous phase. Such theoretical bounds on the state variables (T, P ), where a phase is likely to be stabilized, are instructive for the synthesis a metastable phase of interest.

Test Set
Eq. of State (Test Set)

Accelerating construction of metastable phase diagrams using machine learning
The generation of metastable phase diagram relies on expensive free energy computations for a large number of competing phases. Using ML based surrogate models, we show that this process can be accelerated, and surrogate models that predict G(T, P ) can be constructed. Figure 6 presents the performance of the DNN model trained to predict G(T, P ) given only the structural information in the form of MBTR descriptor. The parity plots in Figure 6 achieved by the DNN model on the training as well as the test set. Notably, hexagonal-diamond, S228 and S20 data were part of the test set and the good DNN performance for these cases illustrates its capability to capture the free energy surface of carbon. Further, in Figure 6(c) we show that our DNN model is able to accurately predict the equation of state of phases in the test set, given only their structural information. The overall root mean square error (RMSE) across all phases in the test set was 90 meV/atom (see section S3 in supporting information). In many cases, high errors in free energy predictions were observed at relatively higher pressures, as partially captured in Figure 6. Once such a surrogate model is trained, the free energy landscape of any new phase can be predicted orders of magnitude faster using only the structural information, thus, speeding up the process of constructing metastable phase diagrams.

Conclusion
In summary, we report on an automated workflow that allows for construction of a "metastable phase diagram". We introduce an alternate representation of metastable phases and their relative stability by providing a free energy scale which helps identify both the metastable phase location and its extent of non-equilibrium. Such a representation is far more informative with regard to designing experiments and accelerating the discovery of metastable phases, which often display exotic properties. Our workflow constructs the metastable phase diagram by combining several synergistic computational approaches including a structural search based on genetic algorithms, deep learning accelerated high-throughput free energy calculations and multiclass support vector machines to classify phase boundaries. We demonstrate the efficacy of our computational approach by using a representative single component carbon system, whose equilibrium and metastable phases have been well studied in the past. We successfully predict the equilibrium phase diagrams, and use the metastable phase diagram to explain several experimentally observed metastable intermediates including diaphitine-like lonsdaelite and its cubic analog, during high-pressure-high-temperature processing of graphite in diamond anvil cell. We also use the information extracted from the metastable phase diagram to propose a cubic-diaphitine structure, as a candidate phase to explain the diffraction pattern of n-diamond. In addition, we show that the phase diagram construction can be accelerated by orders of magnitude with the help of a surrogate ML model, which can reliably predict the equation of states, given only the structural information. Our framework lay the groundwork for computer-aided discovery and design of synthesizable metastable materials. Our data-driven approach is fairly general and applicable to other chemical systems including multi-component alloys. For such systems, we envision a higher dimensional metastable phase diagram that allows exploration of metastability as a function of composition as well.

No
Create new population from genetic operations

Metastable Structures
All the structures within the ∆ cutoff is selected for further analysis

Grouping algorithm
v Group layered structures v Group structures with matching RDF and ADF peaks

Compute V(P,T)
v Discretize range of P and T to create 2D grid of (P,T) points v Compute V at every (P,T)using MD simulations

Gibbs Free Energy for every (P,T)
Gibbs Free energy = + !" + Discretized metastable phase diagram v Overlay the free energy surface of all phases (structures) v Phase corresponding to (P,T) is the one with minimum G v Slice at desired ∆ away from ground state for metastable phase diagram The DFT relaxations are done using the VASP package [14]. LAMMPS package [19] is used to relax structures using LCBOP model. Fitness of each organism (structure) in a given gene pool is evaluated as where H i is the enthalpy of the organism i, H max and H min are the maximum and the minimum enthalpy in the current pool. The gene pool is ranked according to the fitness and parent structures are selected to undergo genetic variations to produce new offspring structures for the subsequent generation of gene pool. The selection probability of each structure is based on the fitness: We define three types of genetic operations to build the subsequent generation of structures: 1. Crossover variation: This genetic variation involves two parents structures. The offspring structure is generated by slicing the parent structures across a random axis and combing the atoms on one side of the slice with the atoms on the other side in the other parent structure.
2. Structure mutation: Structure mutation involves random perturbation of the atomic coordinates and the lattice parameters 3. Number of atoms mutation: Atoms are randomly deleted or added in such a way that the constrains on inter-atomic distances and maximum number of atoms allowed.
A new generation offspring structures are generated using the above operations. The probability that a parent structure is subjected to crossover variation, structure mutation and number of atoms mutation are -0.4, 0.4 and 0.2 respectively. Each of the offspring structure has to pass a redundancy check and satisfy the above mentioned constrains before it can be added to the gene pool of the next generation. Once a new gene pool of 40 structures are obtained, the fitness of the new generation is evaluated and new set of parents are selected based on their probabilities. This cycle is repeated until the difference between the enthalpy of the best and the top N/8 structure is less than a tolerance. The tolerance we used for the case of carbon is 20 meV. We build our algorithm based on the modules and function definitions within the Genetic algorithm for structure and Phase prediction code [2]. Further details on the algorithm and the genetic variations can be found in Ref. [20,22].
Supplementary Figure 2: Evolution of the best structure in the pool We perform the independent evolutionary structure searches at P = 0 GPa, P = 10 GPa & P = 100GPa. After convergence, we build a consolidated list of distinct structures ordered according to increasing value of enthalpies. Only the phases with satisfying H < H ground + ∆H cut−of f are selected for free energy calculations. For carbon, the graphite phase is the experimental ground state with the minimum enthalpy of -7.365 eV/atom (computed using LCBOP model).

S 1.2 Grouping based of RDF and ADF
Some of the candidate structures are structurally very similar and the enthalpies vary only by a small value. For example, in the case carbon, our structure search algorithm predicts hexagonal graphite, orthorhombic graphite and rhombohedral graphite, all which only differ in their stacking patterns and have very similar structural features. Besides at high pressure and temperature conditions, it is highly probable for the layers to slide against each other and change stacking, as can be seen in Figure 3 of the main text. Hence, we group such structures with very high similarity and count them as the same phase. i.e hexagonal graphite, orthorhombic graphite and rhombohedral graphite are considered as "graphite" phase. Only the candidate phase with the least enthalpy within each group is used to compute the free energies. The grouping is done based on the radial distribution function (RDF) and angular distribution function(ADF). Any two structures with matching first two peaks of RDF and ADF are grouped together. We end up with 505 unique groups within 670 meV from the ground state graphite phase. The free energies of the structures within the same group vary only by a small value (Supplementary Figure 3). The Crystallographic Information File (CIF) for each structure is provided in a GitHub repository.

Free Energy Calculations
The temperature (0-3000 K) and pressure range (0-100 GPa), over which the phase information is desired, is discretized into a 16×16 uniformly spaced grid. The free energy of the candidate phases are computed at a temperature and pressure corresponding to each of the grid points. The Gibbs free energy can be written as where T i and P i are the temperature and pressure corresponding to a given grid point i We first compute the enthalpy (H) and density(ρ) of all phases at each of the grid point from MD simulations at the corresponding temperature and pressure (T i , P i). We thus perform 256 (16×16) MD simulations for each phase. We construct a super-cell of 3×3×3 of the initial unit cell to minimze the finite size effects. The interatomic interactions are modeled using the LCBOP [7] potential. We use a timestep of 1fs. The system is first equilibrated under an NVT ensemble with Nosé-Hoover thermostat at the target temperature of T i for 100ps before switching to an NPT ensemble, with target temperature (T i) and pressure (P i ) controlled by Nosé-Hoover thermostat and barostat respectively. The system is subsequently equilibrated for 200ps, which is sufficient for the density (volume) to converge. We simulate the system for an additional 200ps to determine H and ρ by averaging over the data obtained every 10 timestep. All MD simulations were performed using the LAMMPS package [19].
The entropic contribution to the Gibbs free energy (−T S) is determined by modeling the atomic vibrations as harmonic oscillators. The entropy of a system of harmonic oscillators can be written as: At each grid point, the phonon spectrum computed at the corresponding equilibrium density (ρ i ) obtained from the MD simulations. The phonon calculations were performed using the PHONOPY package [23]. The force matrix for the phonon calculations are obtained from the LCBOP model. The total Gibbs free energy G(T i , P i ) is obtained by summing H(T i , P i ) and −T i S i . shows the discretized and continuous equilibrium phase diagram constructed by comparing the G(T, P ) of all the candidate phases identified by our algorithm. Apart from cubic diamond and graphite, which are the dominant stable phases in the experimental phase diagram, we note the appearance of S132 and S353 near the phase boundary. Our calculations show that S132 and S353 has a range of marginal stability (∆G/k B T < 0.3) near the graphite-diamond phase boundary. Observation of metastable diamond and metastable graphite near the phase boundary has been reported in the past [3,25,8,12,18,15,26,6,21,4,5,16,17]. S132 is a stacking disorder phase consisting of alternating layers of cubic diamond and hexagonal diamond (orange in Figure 2 in main text). S353 is diaphitine like distorted hexagonal diamond (purple in Figure 2 in main text) consisting of atomic configuration with two different bond lengths at 1.47Åand 1.53ÅBoth the phases have been observed experimentally during high pressure high temperature treatment of graphite in a diamond anvil. We further inspect the ∆G between S132, S353 and the stable phases. Supplementary Figure 4 Reference Gibbs free energy (eV/atom) ML prediction (eV/atom)

S 3 Deep Neural Network
A deep neural network (DNN) was used to learn the Gibbs free energy of different phases of carbon. It consisted of 4 fully connected (dense) hidden layers with 128, 256, 512 and 64 neurons, respectively, as shown in Supplementary Figure 5(a). The input layer consisted of many-body tensor representation [11] (MBTR) of the 0 K and 0 GPa structure of a phase, and the normalized T and P value. The MBTR fingerprint was obtained using the python library Dscribe [9] with 25 dimensions for the 'k2' (min=0.1, max=2 and σ=0.1) and 25 dimensions for the 'k3' (min=0, max=180 and σ=5) type terms, each normalized individually to the Euclidean length (L 2 ). Features with zero variance throughout the data were removed, while T and P values were included as two additional features, overall resulting in a 43-dimensional input fingerprint to the DNN. The output layer consisted of a single neuron describing the DNN predicted Gibbs free energy of a phase at the input T and P values. The DNN was trained using Adam optimization algorithm [13] with the mean absolute error chosen as the loss function definition. Free energy data corresponding to 248 phases was used to train the model, while that for 30 and 43 phases was used as the validation and test set, respectively. A few important phases, such as cubic, graphite, S132, S291 and S353, were part of the training set, while others, including hexagonal, S228 and S20, were part of the test set. Since some phases were found to be dynamically unstable at different P and T conditions, caution was taken to only include those data points that correspond to reasonable free energy values without any arbitrary jumps in the free energy vs T, or free energy vs P behavior. The number of training epochs was determined by monitoring the model performance on the validation set and a dropout layer (with value of 0.3) was used after the third hidden layer for regularization purposes. The DNN code was implemented in Tensorflow [1]. The overall performance of the DNN model on the training as well as the test set is presented in Supplementary Figure 5(b).
In particular, using our surrogate ML model, we can quickly estimate the proximity of a newfound metastable phase with respect to the ground state, given only the structural information. The probability of realizing a metastable phase at a given temperature and pressure is directly proportional to exp(− ∆G  Figure 6 shows the predictions of the DNN at 12.5 GPa, G(T, P = 12.5GP a), for metastable phases the ML model has never seen during training (S228, S20 and S50 are part of test set). The error between MD computations and the ML predictions are less than 40 meV/atom. Thus, we can quickly classify a metastable phase as near-equilibrium and more likely to be synthesized, or far-from-equilibrium and less likely to be synthesized, by comparing free energies with ground state phases. HPHT samples were obtained using a diamond anvil cell (DAC). Starting material is a 60×20 µm single crystal graphite disk cut from a millimeter size crystal by micro laser drilling system and it was loaded into a hundred micron diameter rhenium gasket chamber. Pressure was monitored by ruby fluorescence. When pressure is at 20 GPa YAG, laser heated samples at the center ( 1400 K) turned dark transparent but the rim remained dark. In this work, after decompression from high pressure and temperature treatment, we opened the DACs, transferred the samples from the chamber to a clean marble mortar with a tiny pin. TEM samples were prepared by crushing the recovered sample using a marble mortar and pestle and then dispersing these crushed powders onto a holey carbon grid. Focused-ion beam (FIB) technique is also used to prepare plane-view and cross-sectional TEM specimens. Argonne Chromatic Aberration-corrected TEM (ACAT, FEI Titan 80-300ST TEM/STEM) with a field-emission gun was used to investigate the crystallographic orientation, high-resolution transmission electron microscopy (HRTEM) images from the recovered samples. The initial structure of n-diamond (S291) as identified by our evolutionary algorithm is relaxed under an anisotropic pressure of 48 GPa in the y-direction and 20 GPa in the x-and z-directions. The resulting structure is still a cubic diamond like structure with two different bond lengths of 1.56Åand 1.50 A (Supplementary Figure 8). The simulated diffraction pattern of the final structure matches well with the previously reported n-diamond structure. We next inspect the stability of the proposed structure by computing the phonon spectrum and checking for any possible imaginary modes. The phonon spectrum is computed using PHONOPY package [23] with force constants obtained form density functional perturbation theory(DFPT). The relevant high symmetry points labeled in the phonon spectrum were obtained using the algorithm described in Ref. [10] which uses the spglib library [24] to construct the brillouin zone. The structure is stable since there are no imaginary modes.

S 4 Far from equilibrium metastable phase diagrams
Cubic-diamond consists of two fcc lattices that shift along [111] diagonal direction respect to each other. When the shift distance equals to a sp 3 bond length, (200) diffraction spots extinguish since cubic-diamond has one sp 3 bond length. When the shift distance of these two fcc lattices is away from 1.54Å, the intensity at (200) diffraction spots gradually increases. In the simulated diffraction pattern using S291 structure, we can find the intensity of (200) spots are much lower than (400) spots due the small difference in these two bond lengths (1.56Å, 1.50Å). In the experiment diffraction pattern, the (200) intensity is close to the (400), indicating this n-diamond has a much smaller bond length than 1.50A (close to real graphite sp 2 bond length 1.42Å). Evidence of diaphite-like lonsdaelite phase are provided in Ref [27]. Specifically, Figure 2 of Ref [27] shows the AC-HRTEM image of lonsdaleite along [1120] with two different bond lengths (OA ≈ 1.56Å and OB ≈ 1.47Å) and Lonsdaelite phase after relaxing under anisotropic pressure. The UV Raman spectrum on the recovered sample shows three peaks (Figure 3 of Ref [27]). The DFT calculated frequencies dependence on bond lengths match with that of experiments.