Development of a physically-informed neural network interatomic potential for tantalum

Large-scale atomistic simulations of materials heavily rely on interatomic potentials, which predict the system energy and atomic forces. One of the recent developments in the field is constructing interatomic potentials by machine-learning (ML) methods. ML potentials predict the energy and forces by numerical interpolation using a large reference database generated by quantum-mechanical calculations. While high accuracy of interpolation can be achieved, extrapolation to unknown atomic environments is unpredictable. The recently proposed physically-informed neural network (PINN) model significantly improves the transferability by combining a neural network regression with a physics-based bond-order interatomic potential. Here, we demonstrate that general-purpose PINN potentials can be developed for body-centered cubic (BCC) metals. The proposed PINN potential for tantalum reproduces the reference energies within 2.8 meV/atom. It accurately predicts a broad spectrum of physical properties of Ta, including (but not limited to) lattice dynamics, thermal expansion, energies of point and extended defects, the dislocation core structure and the Peierls barrier, the melting temperature, the structure of liquid Ta, and the liquid surface tension. The potential enables large-scale simulations of physical and mechanical behavior of Ta with nearly first-principles accuracy while being orders of magnitude faster. This approach can be readily extended to other BCC metals.


Introduction
The critical ingredient of all large-scale molecular dynamics (MD) and Monte Carlo (MC) simulations of materials is the classical interatomic potential, which predicts the system energy and atomic forces as a function of atomic positions and, for multicomponent systems, their occupation by chemical species. Computations with interatomic potentials are much faster than quantum-mechanical calculations explicitly treating the electrons. The computational efficiency of interatomic potentials enables simulations on length scales up arXiv:2101.06540v1 [cond-mat.mtrl-sci] 16 Jan 2021 to ∼ 10 2 nm (∼ 10 7 atoms) and time scales up to ∼ 10 2 ns.
Interatomic potentials partition the total potential energy E into a sum of energies E i assigned to individual atoms i: E = i E i . Each atomic energy E i is expressed as a function of the local atomic positions R i ≡ (r i1 , r i2 , ..., r in i ) in the vicinity of the atom. The form of the potential function must ensure the invariance of energy under rotations and translations of the coordinate axes, and permutations of the atoms. The partitioning into atomic energies makes the total energy computation a linear-N procedure (N being the number of atoms), enabling effective parallelization by domain decomposition. Physically, this partitioning is only justified for systems with short-range interactions. The potential function Φ in Eq.(1) additionally depends on a set of adjustable parameters p = (p 1 , ..., p m ), which are optimized by training on a reference database. Once the optimization is complete, the parameters are fixed once and for all and used in all subsequent simulations. The atomic forces required for MD simulations are obtained by differentiation of the total energy with respect to atomic coordinates. Potentials can be divided into two categories according to their intended usage. Generalpurpose potentials are trained to reproduce a broad spectrum of properties that are most essential for atomistic simulations. The reference structures must be diverse enough to represent the most typical atomic environments occurring in simulations. Once published, the potential is used for almost any type of simulation of the material. Special-purpose potentials are designed for one particular type of simulation and are not expected to be transferable to other applications.
Two major classes of potentials are the traditional potentials and the emerging class of machine-learning (ML) potentials [1][2][3][4]. The traditional potentials use a functional form of Φ(R i , p) in Eq.(1) that reflects the basic physics of interatomic bonding in the given material. Accordingly, such functional forms are specific to particular classes of materials. For example, the embedded atom method (EAM) [5][6][7], the modified EAM (MEAM) [8], and the angular-dependent potential (ADP) [9] are designed for metallic systems. The Tersoff [10][11][12] and Stillinger-Weber [13] potentials were specifically developed for strongly covalent materials such as silicon and carbon. Traditional potentials depend on a small (∼ 10) number of parameters, which are trained on a small database of experimental properties and first-principles energies or forces. The accuracy of traditional potentials is limited compared to both the first-principles calculations and the ML potentials. However, due to the physical underpinnings, traditional potentials often demonstrate reasonable transferability to atomic configurations lying well outside the training dataset. As long as the nature of chemical bonding remains the same as assumed during the potential construction, the predicted energies and forces may not be very accurate but at least remain physically meaningful. Most of the traditional potentials are general-purpose type.
The ML potentials are based on a different philosophy. The physics of interatomic bonding is not considered beyond the principle of locality of atomic interactions and the invariance of energy. The potential function (1) is a high-dimensional nonlinear regression implemented numerically. This function depends on a large (∼ 10 3 ) number of parameters, which are trained on a database containing 10 3 to 10 4 supercells whose energies or forces (or both) are obtained by high-throughput density functional theory (DFT) [14,15] calculations. An ML potential computes the energy in two steps. First, the local position vector R i is mapped onto a set of local structural parameters G i = (G i1 , G i2 , ..., G iK ), which encode the local environment and are invariant under rotations, translations, and relabeling of atoms. Behler and Parrinello [16] proposed that the size K of the descriptors G i be the same for all atoms, even though the number of neighbors n i can vary from one atom to another. At the second step, the K-dimensional descriptor space is mapped onto the 1D space of atomic energies. This mapping is implemented by a pre-trained nonlinear regression R. Thus, the atomic energy calculation can be represented by the formula R i → G i R → E i . Several regression methods have been employed, such as the Gaussian process regression [17][18][19][20][21][22][23], the kernel ridge regression [24][25][26], artificial neural networks (NN) [1,[27][28][29][30][31][32][33][34][35][36][37][38], the spectral neighbor analysis [39][40][41], and the moment tensor potentials [42].
The development of an ML potential is a complex high-dimensional problem, which is solved by applying ML methods for the reference database generation, training, and error quantification. With the large number of parameters available, an ML potential can be trained to reproduce the reference database within a few meV/atom, which is the intrinsic uncertainty of DFT calculations. Since the potential format is independent of the nature of chemical bonding, ML potentials are not specific to any particular class of materials. The same procedure is applied to develop a potential for a metal, nonmetal, or a mixedbonding system. However, the high accuracy and flexibility come at a price: ML potentials are effective numerical interpolators but poor extrapolators. Since the energy and force predictions are not guided by any physics, extrapolation outside the domain of known environments is unpredictable and often unphysical. Nearly all ML potentials are specialpurpose type; development of general-purpose ML potentials is an extremely challenging task [22,43].
The recently proposed physically-informed neural network (PINN) method [44] takes the best from both worlds by integrating an ML regression with a physics-based interatomic potential. Instead of directly predicting the atomic energy, the regression predicts the best set of potential parameters p i appropriate to the given environment. The potential Φ(R i , p i ) then computes the atomic energy using the predicted parameter set p i . Thus, the formula of the method is ( Fig. 1a) The method drastically improves transferability to new environments because the extrapolation is now guided by the physics embedded in the interatomic potential rather than a purely mathematical algorithm. This general idea can be realized with any regression method and any meaningful interatomic potential. PINN is a particular realization of this approach using a NN regression and an analytical bond-order potential (BOP) [44]. The latter has a functional form general enough to work for both metals and nonmetals. Specifically, the potential represents pairwise repulsions and attractions of the atoms, the bond-order effect (bond weakening with the number of bonds), the angular dependence of the bond energies, the screening of bonds by surrounding atoms, and the promotion energy.
The interactions extend to neighbors within a 0.5 to 0.6 nm distance with a smooth cutoff. More details about the BOP potential can be found in the Methods section.
The original PINN formulation [44] has been recently improved by introducing a global BOP potential trained on the entire reference database. Since the optimized parameter set (p 0 ) is small, the error of fitting is relatively large (∼ 10 2 meV/atom). A pre-trained NN then adds to p 0 a set of local "perturbations" δp i = (δp i1 , ..., δp im ) to obtain the final parameter set p i = p 0 + δp i . The latter is used to predict the atomic energy E i = Φ(R i , p 0 + δp i ). In this scheme, the energy predictions are largely guided by the global BOP potential, which provides a smooth and physically meaningful extrapolation outside the training domain. The magnitudes of the perturbations are kept as small as possible. The same DFT level of accuracy is achieved during the training as in the original PINN formulation [44], no computational overhead is incurred, but the transferability is improved significantly.
The modified PINN method has been recently applied to develop a general-purpose ML potential for the face-centered cubic (FCC) Al [43]. Here, we demonstrate that the method can also generate highly accurate and transferable general-purpose potentials for body-centered cubic (BCC) metals. We chose tantalum as a representative transition BCC metal, in which the interatomic bonding has a significant directional component due to the d-electrons. In addition to many structural applications of Ta and Ta-based alloys [45,46], porous Ta is a promising biomaterial for orthopedic applications due to its excellent biocompatibility and a favorable combination of physical and mechanical properties [47]. This work paves the way for the development of PINN potentials for other BCC metals in the future.

Results
The potential development. The reference database has been generated by DFT calculations employing the Vienna Ab Initio Simulation Package (VASP) [48,49] (see the Methods section for details). The database consists of the energies of 3,552 supercells containing from 1 to 250 atoms. The supercells represent energy-volume relations for 8 crystal structures of Ta, 5 uniform deformation paths between pairs of structures, vacancies, interstitials, surfaces with low-index orientations, 4 symmetrical tilt grain boundaries, γ-surfaces on the (110) and (211) fault planes, a 1 2 [111] screw dislocation, liquid Ta, and several isolated clusters containing from 2 to 51 atoms. Some of the supercells contain static atomic configurations. However, most are snapshots of ab initio MD simulations at different densities, and temperatures ranging from 293 K to 3300 K. The BCC structure was sampled in the greatest detail, including a wide range of isotropic and uniaxial deformations. The database represents a total of 161,737 highly diverse atomic environments occurring in typical atomistic simulations. More detailed information about the database can be found in Supplementary Tables 1 and 2. About 90% of the supercells, representing 136,177 environments, were randomly selected for training. The remaining 10% were set aside for cross-validation.
The potential training process involves several hyper-parameters describing the NN architecture, the number and types of the local structural parameters, and the regularization coefficients in the loss function. We tested many combinations of these parameters before choosing the final version. We emphasize that, for almost any choice of the hyperparameters, the potential could be trained to an error of about 3 meV/atom. However, the predicted properties of Ta were different. Even different random initializations of the NN's weights and biases resulted in slightly different combinations of Ta properties. Hundreds of initialization-training cycles with automated property testing had to be performed before a potential that we deemed the best was selected. The hyper-parameters and the properties reported below refer to the final version of the potential.
As the local structural descriptors, we chose products of a radial function and an angular function. The radial function is a Gaussian peak of width σ = 1Å centered at radius r 0 and smoothly truncated at a cutoff r c = 4.8Å within the range d = 1.5Å (see the Methods section). The angular part is a Legendre polynomial P l (cos θ ijk ), where θ ijk is the angle between the bonds ij and ik. The size of the descriptors G i is K = 40, corresponding to the set of Gaussian positions r 0 = {2.4, 2.8, 3.0, 3.2, 3.4, 3.6, 4.0, 4.4}Å and the Legendre polynomials of orders l = 0, 1, 2, 4 and 6.
The NN has a feed-forward 40×32×32×8 architecture with 32 nodes in each of the hidden layers and 8 nodes in the output layer. The latter corresponds to the m = 8 perturbations to the BOP parameters delivered by the NN. The total number of weights and biases in the NN is 2,632. The loss function represents the mean-square deviation of the predicted supercell energies from the DFT energies, augmented by two regularization terms as detailed in the Methods section. The NN's weights and biases were optimized by the L-BFGS unconstrained minimization algorithm [50] to reach the root-mean-square error (RMSE) of 2.80 meV/atom. Fig. 1b demonstrates the accurate and unbiased agreement between the PINN and DFT energies over a 13 eV/atom wide energy interval. Although atomic forces were not used during the training, the forces predicted by the potential were compared with DFT forces after the training. A strong correlation is observed ( Supplementary Fig. 1) with the RMSE of about 0.3 eV/Å. Note that the comparison includes forces as large as 20 eV/Å. 10-fold cross-validation was performed to verify that the database was not overfitted. At each rotation, the set-aside dataset mentioned above replaced a similar number of supercells in the training set. The training was repeated, and the RMSE of the potential was computed on the validation dataset unseen during the training. The RMSE of validation averaged over the 10 rotations is 2.89 meV/atom.
Properties of BCC Ta. The Ta properties predicted by the PINN potential were computed mainly with the ParaGrandMC (PGMC) code [51][52][53] and compared with DFT calculations performed in this work. For consistency, the property calculations used the same density functional and other DFT parameters as for the reference database generation. Table I Fig. 2). This agreement makes the potential suitable for atomistic simulations of shock deformation of Ta. As another test, the BCC lattice was sheared in the [111] direction parallel to the (112) plane, transforming the BCC lattice to itself along the twinning or anti-twinning deformation paths. The energy variation along both paths accurately follows the DFT calculations ( Fig. 3b), which demonstrates the ability of the potential to model deformation twinning in Ta.
The potential accurately reproduces the DFT vacancy formation and migration energies (Table I). We tested five different configurations of self-interstitial defects and found excellent agreement between the PINN and DFT energies in all cases. Both PINN and DFT predict the [111] dumbbell to be the lowest energy state. Due to its ability to describe the point-defect properties on the DFT level of accuracy, the potential can be safely used to model diffusion-controlled processes and radiation damage in Ta. The surface energies predicted by the potential are close to the DFT values (Table I). Furthermore, Fig. 4 and Supplementary Fig. 3 show that the potential faithfully reproduces the surface relaxations for all four crystallographic orientations tested here. Surface properties of Ta are important for catalytic applications and the biocompatibility of porous structures [47].
Mechanical behavior of Ta is controlled by the 1 2 [111] screw dislocations, which have low mobility caused by the non-planar core structure [54][55][56]. In turn, the core structure depends on the shape of the γ-surface, representing the energy of a generalized stacking fault parallel to a given crystal plane as a function of the displacement vector. For the screw dislocations in BCC metals, the relevant fault planes are (110) and (211). DFT γsurfaces for these two planes were included in the reference database during the potential development (Fig. 5a,b). The potential accurately reproduces both γ-surfaces, as illustrated by select cross-sections shown in Fig. 5c,d. The cross-sections along the [111] direction are especially important as they determine the lowest-energy core structure. The computed cross-sections predict a non-generate core. To test this prediction, we directly computed the relaxed dislocation core structure using both the potential and the DFT (see the Methods section). The Nye tensor plot [57] in Fig. 6a confirms that the core computed with the PINN potential indeed has a non-generate structure. The Nye tensor plot obtained by DFT calculations is not shown as it looks virtually identical to Fig. 6a. Non-degenerate core structures were also found in previous DFT calculations for BCC transition metals [58]. By contrast, most of the traditional potentials incorrectly predict a degenerate core with a different symmetry. Many of them also predict a spurious local minimum on the γ-surface [59].
The dislocation glide mobility depends on the Peierls barrier, defined as the energy barrier that a straight dislocation must overcome to move between two adjacent equilibrium positions. We computed the Peierls barrier of the dislocation using the same supercell setup as in the dislocation core calculations. The minimum energy path between the fullyrelaxed initial and final dislocation positions was obtained by the nudged elastic band (NEB) method [60,61]. The PINN and DFT calculations give nearly identical results for the energy along the transition path, showing a single maximum in the middle (Fig. 6b). The height of the barrier, normalized by the Burgers vector's magnitude, is in close agreement with previous DFT calculations [58].
Alternate crystal structures of Ta. In addition to the ground-state BCC structure, the reference database contained the equations of state (energy versus atomic volume) for seven more crystal structures. The PINN potential accurately reproduces the DFT equations of state for all seven structures, including the low-energy A15 and β-Ta structures that compete with the BCC structure for the ground state (Fig. 7). Furthermore, the potential continues to predict physically reasonable behavior well outside the volume range covered by the DFT points. The energy continues to rise monotonically under strong compression and smoothly approaches zero at the cutoff distance between atoms ( Supplementary Fig. 4). This behavior illustrates the physics-based transferability of the PINN model guided by the BOP potential. Table II shows the results of additional tests. The potential predicts the ground state energies of a dimer, a trimer, and a tetrahedron in close agreement with DFT calculations, even though these clusters were represented in the reference database by a small number of supercells. As another transferability test, we performed DFT and PINN calculations of equilibrium energies for four relatively open crystal structures, including the A7 structure typical of nonmetals. Although the DFT energies of these structures were not included in the reference database, they are predicted by the potential reasonably well (Table II).
To sample atomic configurations away from the stable and metastable crystal structures, we included four volume-conserving deformation paths connecting such structures.
The tetragonal, trigonal, orthorhombic, and hexagonal paths continuously transform the BCC structure to FCC, HCP, SC, or body-centered tetragonal (BCT) structures (Fig. 8). Each path is characterized by a single deformation parameter p defined in Ref. [56]. The hexagonal path combines a homogeneous deformation with a simultaneous shuffling of alternate close-packed atomic planes, while the remaining paths are fully homogeneous. Fig. 8 shows that the PINN potential accurately reproduces the energy along all four paths.
The liquid phase and solid-liquid coexistence. We computed the liquid Ta structure by NVT MD simulations using the PINN potential and DFT at three temperatures: below, above, and near the melting point (see the Methods section for computational details). The potential perfectly reproduces the radial distribution function and the bond-angle distribution, as demonstrated in Fig. 9 for the temperature of 3500 K and in Supplementary Figs. 5 and 6 for two other temperatures.
The melting point calculation used the solid-liquid phase coexistence method [52,[62][63][64]. A simulation block containing the solid and liquid phases separated by a planar interface was coupled to a thermostat. In the canonical ensemble, one of the phases always grows at the expense of the other, depending on the temperature. MD simulations were performed at several temperatures around the expected melting point, and the energy increase or decrease rate was measured at each temperature (see the Methods section). Interpolation to the zero rate gave us the melting temperature T m at which the phases coexist in equilibrium ( Supplementary Fig. 7). The PINN potential predicts the melting temperature of 3000±6 K, which is reasonable but below the experimental value (3293 K). Perfect agreement with experiment was not expected since the potential was trained on DFT data without any experimental input. The only DFT calculation that we could find in the literature gave T m = 3085 ± 130 K [65], which matches our result within the statistical uncertainty.
The liquid surface tension of Ta was computed by the capillary fluctuation method (see Methods). The number obtained, γ = 1.69 J m −2 , is comparable to the experimental values of 2.07 J m −2 (from equilibrium shape of molten samples in microgravity) [66] and 2.15 J m −2 from electrostatic levitation [67]. The experimental measurements were performed on relatively large (a few mm) droplets and their accuracy was limited due to many factors, such as temperature control, surface contamination, and evaporation. The accurate value of γ reported here is one of the missing material parameters in the models of 3D printing (additive manufacturing) of Ta and Ta-based alloys [46].

Discussion
After the initial publication [44], the PINN model has been modified by reducing the role of the NN regression to only predicting local corrections to a global BOP potential. This step has helped further improve the transferability of PINN potentials to atomic configurations lying outside the reference database. As with any model, a PINN potential eventually fails when taken too far away from the familiar territory. However, the physicsguided extrapolation significantly expands the potential's applicability domain compared with purely mathematical ML models. The proposed integration of ML with a physics-based interatomic potential preserves the DFT level of the training accuracy without increasing the number of fitting parameters. As recently demonstrated [43], the computational overhead due to incorporating the BOP is close to 25%, which is marginal given that the potential is orders of magnitude faster than straight DFT calculations.
The improved transferability of the PINN model enables the development of generalpurpose ML potentials intended for almost any type of MD or MC simulations, similar to the off-the-shelve traditional potentials. This capability has already been demonstrated by developing a general-purpose PINN potential for FCC Al [43]. The present work has made the next step by extending the PINN model to BCC metals. While FCC metals can be described by EAM, MEAM, or ADP potentials reasonably well, BCC transition metals have been notoriously challenging. For example, most traditional potentials fail to predict the correct dislocation core structure and the Peierls barrier in BCC metals even qualitatively. The PINN Ta potential developed here reproduces both in excellent agreement with DFT calculations. The potential has been trained on a highly diverse DFT database and reproduces a broad spectrum of physical properties of Ta with DFT accuracy. Extrapolation of the potential to unknown atomic configurations has also been demonstrated. We foresee that this potential will be widely used in atomistic simulations of Ta, especially simulations of mechanical behavior under normal conditions and shock deformation. Using the same approach, PINN potentials for other BCC metals can be developed in the future.
The PINN method undergoes rapid development. PINN potentials for other metallic and nonmetallic elements are now being constructed. A multi-component version of PINN has been developed and will be reported in forthcoming publications, together with several PINN potentials for binary systems. The PGMC MD/MC code [51][52][53] now works with multi-component PINN potentials. PINN potentials will soon be incorporated in the Largescale Atomic/Molecular Massively Parallel Simulator (LAMMPS) [68].

Methods
DFT calculations. We used the generalized gradient approximation with the Perdew-Burke-Ernzerhof (PBE) density functional [69,70] implemented in VASP [48,49]. The calculations were performed with the kinetic energy cutoff of 410 eV and a Methfessel-Paxton smearing of order 1 with the smearing width of 0.1 eV. Monkhorts-Pack k-point meshes were used to sample the Brillouin zone, with the number of grid points chosen to ensure the energy convergence to around 1 meV/atom. Before the training, all supercell energies (per atom) were shifted by a constant such that the DFT energy of the equilibrium BCC structure is equal to the negative experimental cohesive energy of BCC Ta.
The phonon calculations utilized the phonopy package [71] with a 6×6×6 primitive BCC supercell (216 atoms) and a 6×6×6 Monkhorst-Pack k-point grid. It was verified that, with this grid, the supercell energy was converged to 1 meV/atom, and the phonon dispersion curves did not show noticeable changes with the number of k-points. For point defects, we used a cubic supercell of 128±1 atoms with full relaxation of both atomic positions and the volume. The vacancy migration energy was calculated by the NEB method [60,61] with fully relaxed supercells as the initial and final vacancy configurations. For computing the surface energies and surface relaxations, we used stacks of 24 atomic layers with a chosen crystallographic orientation. The γ-surface calculations employed similar 24-layer stacks. A generalized stacking fault was formed in the middle of the supercell by displacing the upper half of the layers relative to the lower half by small increments. After each increment, the energy was minimized by only allowing relaxations perpendicular to the fault plane.
To find the dislocation core structure, we created a dipole configuration consisting of 231 atoms with periodic boundary conditions in all three dimensions as described in Ref. [72]. By contrast to a single dislocation, this dipole configuration avoids the possible incompatibility with periodic boundary conditions. The two dislocations were first introduced into the supercell by displacing the atoms according to the anisotropic elasticity solution for the strain field. The atomic positions were then relaxed before examining the core structure.
The liquid structure calculations utilized a 250-atom cubic supercell with the experimental liquid density [73] near the melting temperature. The initial BCC structure was melted and equilibrated at the temperature of 5,000 K before quenching it to the target temperature with the supercell box fixed. After equilibration at the target temperature, an NVT MD run was performed in which 100 snapshots were saved at 60 fs intervals. The saved configurations were used to compute the average structural properties of the liquid following the procedure outlined in Ref. [73]. For the bond-angle distribution, we only considered atoms within a sphere corresponding to the first minimum of the radial distribution function.
The BOP potential. The current version of the BOP potential underlying the PINN model was described elsewhere [43] and is only briefly reviewed here for completeness. The energy assigned to atom i is given by where the summation is over neighbors j separated from the atom i by a distance r ij . The interactions are truncated at the cutoff distance r c using the cutoff function where the parameter d controls the truncation smoothness. The first term in the square brackets in Eq.(3) describes the repulsion between atoms at short separations. The second term describes chemical bonding and captures the bond-order effect through the coefficient where z ij represents the number of bonds ik formed by the atom i. The bonds are counted with weights depending on the bond angle θ ijk : The angular dependence account for the directional character of the bonds. All chemical bonds are screened by the screening factor S ij defined by where the partial screening factors S ijk represent the contributions of individual atoms k to the screening of the bond i-j: where λ i is the inverse of the screening length. The closer the atom k is to the bond i-j, the smaller S ijk , and thus the larger is its contribution to the screening. Finally, the on-site energy represents the embedding energy in metals and the promotion energy for covalent bonding. The potential functions depend on 10 parameters, 8 of which (A, B, α, β, a, h, λ and σ) are locally adjusted by the NN while d and r c are treated as global.
The training procedure. The NN weights and biases were initialized by random numbers in the interval [-0.3,0.3]. The goal of the training was to minimize the loss function with respect to the weights w κ and biases b ν . In Eq.(10), E s is the total energy of supercell s predicted by the potential, E s DFT is the respective DFT energy, N s is the number of atoms in the supercell s, N is the number of supercells in the reference database, N a is the total number of atoms in all supercells, N p is the total number of NN parameters, and τ 1 and τ 2 are adjustable coefficients. The second and third terms are added for regularization purposes. The second term ensures that the network parameters remain reasonably small for smooth interpolation. The third term controls the variations of the BOP parameters relative to their values p isn averaged over the training database. The values τ 1 = 10 −10 and τ 2 = 10 −10 were chosen from previous experience [43]. The L-BFGS algorithm implementing the minimization requires the knowledge of partial derivatives of E with respect to the NN parameters, which were derived analytically and implemented in the training code. Since the loss function has many local minima, the training had to be repeated multiple times, starting from different initial conditions. Due to the large size of the optimization problem, the training process heavily relies on massive parallel computations.
PINN calculations of Ta properties. The PINN calculations utilized the same atomic configurations as in the DFT calculations, except for larger system sizes in some cases. The phonon calculations employed the same phonopy package [71]. The thermal expansion was computed by NPT MD simulations on a 4,394-atom periodic block. This simulation block was also used for point-defect and liquid-structure calculations. A stack of 48 atomic layers was created for the surface and γ-surface calculations. The dislocation core analysis and the Peierls barriers calculation were based on the same supercells as in the DFT calculations.
The melting point calculation followed the methodology of [43]. An orthorhombic periodic simulation block had the dimensions of 39 × 42 × 202Å (16,320 atoms) and contained approximately equal amounts of the solid and liquid phases. The solid-liquid interface had the (111) crystallographic orientation and was normal to the long direction of the block. The solid phase was scaled according to the thermal expansion factor at the chosen temperature to eliminate internal stresses. The MD simulations were performed in the canonical ensemble, in which the interface cross-section remained fixed while the long dimension of the simulation block was allowed to vary to ensure zero normal stress. The solid crystallized if the temperature was below the melting temperature T m and melted if it was above T m . The phase change was accompanied by a decrease in the energy in the first case and an increase in the second. The steady-state energy rate was recorded as a function of the simulation temperature and interpolated to zero to obtain T m . The liquid surface tension of Ta was computed by the capillary fluctuation method closely following the methodology of the previous calculations for Al [43]. The liquid simulation block had a ribbon-type geometry with the dimensions of 613 × 34 × 212Å (216,000 atoms) as shown in Supplementary  Fig. 8. After equilibration at the melting temperature, 241 MD snapshots were saved at 1 ps time intervals. The capillary wave amplitudes A(k) were obtained by a discrete Fourier transformation of the liquid surface shape. The surface tension γ was obtained from the linear fit to the plot of the inverse of the ensemble-averaged spectral power |A(k)| 2 versus the wave-number squared k 2 in the long wavelength limit (Supplementary Fig. 9).  [74], b Ref. [75], c Ref. [76], d Ref. [77], e Ref. [75], f Ref. [78], g Ref. [79], j Ref. [80], h Ref. [81], i Ref. [82], k Ref. [83], l Ref. [84], m Ref. [85], n Ref. [86], o Ref. [65], * Average orientation      b Bond-angle distribution function.