An automated framework for high-throughput predictions of NMR chemical shifts within liquid solutions

Identifying stable speciation in multi-component liquid solutions is fundamentally important to areas from electrochemistry to organic chemistry and biomolecular systems. Here we introduce a fully automated, high-throughput computational framework for the accurate prediction of stable species in liquid solutions by computing the nuclear magnetic resonance (NMR) chemical shifts. The framework automatically extracts and categorizes hundreds of thousands of atomic clusters from classical molecular dynamics simulations, identifies the most stable species in solution and calculates their NMR chemical shifts via density functional theory calculations. Additionally, the framework creates a database of computed chemical shifts for liquid solutions across a wide chemical and parameter space. We compare our computational results to experimental measurements for magnesium bis(trifluoromethanesulfonyl)imide Mg(TFSI)2 salt in dimethoxyethane solvent. Our analysis of the Mg2+ solvation structural evolutions reveals key factors that influence the accuracy of NMR chemical shift predictions in liquid solutions. Furthermore, we show how the framework reduces the performance of over 300 13C and 600 1H density functional theory chemical shift predictions to a single submission procedure. A fully automated, high-throughput computational framework is presented to predict stable species in liquid solutions. This framework combines density functional theory with classical molecular dynamics simulations to compute the NMR chemical shifts.


ABSTRACT 22
Identifying stable speciation in multicomponent liquid solutions is of fundamental importance to 23 areas ranging from electrochemistry to organic chemistry and biomolecular systems. However, 24 elucidating this complex solvation environment is a daunting task even when using advanced 25 experimental and computational techniques. Here, we introduce a fully automated, high-26 throughput computational framework for the accurate and robust prediction of stable species 27 present in liquid solutions by computing the nuclear magnetic resonance (NMR) chemical shifts 28 of molecules. The framework automatically extracts and categorizes hundreds of thousands of 29 atomic clusters from classical molecular dynamics (CMD) simulations to identify the most stable 30 speciation in the solution and calculate their NMR chemical shifts via DFT calculations. 31 Additionally, the framework creates an output database of computed chemical shifts for liquid 32 solutions across a wide chemical and parameter space. This task can be infeasible experimentally 33 and challenging using conventional computational methods. To demonstrate the capabilities of our 34 framework, we compare our computational results to experimental measurements for a complex 35 test case of magnesium bis(trifluoromethanesulfonyl)imide Mg(TFSI)2 salt in dimethoxyethane 36 (DME) solvent, which is a common electrolyte system for Mg-based batteries. Our extensive 37 benchmarking and analysis of the Mg 2+ solvation structural evolutions reveal critical factors such 38 as the effect of force field parameters that influence the accuracy of NMR chemical shift 39 predictions in liquid solutions. Furthermore, we show how the framework reduces the efforts of 40 performing and managing over 300 13  formats, the framework can take query criteria to retrieve previously optimized structures from the 151 database. It can also derive a structure on the fly by either attaching a functional group or linking 152 two structures at a specific binding site. Next, the framework runs an electrostatic partial charges 153 (ESP) workflow that first converts the input structure formats to pymatgen molecule objects. The 154 ESP workflow uses this molecule object to generate a Gaussian input file with input parameters 155 specified as optional inputs to the workflow. The workflow uses default values if these parameters 156 are not provided. It then runs three sequential steps: (1) a DFT geometry optimization, (2) a 157 vibrational frequency calculation to ensure that there are no imaginary frequencies, and (3) a 158 population analysis to assign atomic charges. The framework executes the ESP workflow for each 159 component of the liquid solution. 160 Fig. 1 Scheme of the computational framework used to calculate NMR chemical shifts in solution as implemented in the MISPR high-throughput infrastructure We note that the framework is general enough to be applied to various complex liquid solutions 161 at different conditions (e.g., concentration, temperature, pressure, etc.). It requires, at minimum, 162 the concentration of species in the solution and the size and geometry of the system box to prepare 163 the multicomponent system for CMD simulations. One of the most challenging aspects of running 164 automated CMD simulations is selecting or generating accurate force field parameters. By default, 165 the framework uses the output of the ESP workflow to derive the general amber force field 166 (GAFF) 36 parameters for each species. The framework also supports other force fields allowing 167 the user to test different physical models for a specific application or system. In this case, the user 168 may input the force field parameters to the framework in the form of a Python dictionary or retrieve 169 them from our in-house database. We note that the user may bypass the ESP workflow if the ESP 170 charges have been previously calculated or other force fields are directly provided. The framework then passes the optimized geometries, force field parameters, concentrations, and information 172 about the geometry of the simulation box (e.g., lengths, shape, etc.) to the next step to build the 173 system for LAMMPS simulations. Following this, the framework runs a CMD workflow to 174 generate time trajectories of atomic positions and velocities. Configurations for common CMD 175 procedures are encoded in a set of protocols that can be used directly or altered to run any series 176 of LAMMPS calculations according to the user's needs. The default CMD configuration involves 177 energy minimization, NPT equilibration at the desired temperature and pressure, melting and 178 quenching, and NVT production runs. 179 The framework then uses the generated LAMMPS trajectory files to compute the radial 180 distribution function (RDF) between all possible pairs of particle types in the system or specific 181 pairs specified as inputs. The RDF module is part of a standalone in-house suite of Python tools 182 that we developed to extract a range of structural and dynamical properties from LAMMPS 183 trajectory and output files. The RDF defines the probability of finding a particle at a distance r 184 from another particle. More details about the RDF calculations are provided in the section 1 of the 185

SI. 186
Sampling solvation structures from the CMD step is a key component of the NMR framework. 187 Traditional NMR calculations are relatively inefficient at constructing initial guesses for molecular 188 structures. Building molecular structures by manually placing a number of molecules in the 189 solvation shell of the particle of interest is extremely time consuming 20,23,24 . In contrast, our 190 framework passes the computed RDF from the previous step to perform sampling of the first 191 solvation shell of a particle of interest in a straightforward and automated manner. In the 192 framework, the first solvation shell is defined by the cutoff distance , corresponding to the 193 position of the first minimum after the main peak of the RDF between the particle of interest and 194 other coordinating particles in the solution. In the default operation of the framework, is 195 automatically extracted from the RDF, but the user may override this by providing as an 196 optional input. Thus, a cluster representing the solvation structure is defined as the group of species 197 within of the particle. By ensemble averaging hundreds of thousands of clusters, we obtain a 198 distribution of clusters corresponding to all the possible chemical environments surrounding the 199 particle of interest in the solution. 200 Next, the framework categorizes the extracted clusters into unique configurations based on the 201 type and number of species surrounding the particle and their mode of coordination. Then, it 202 calculates the probability of each configuration as the ratio of the number of clusters that belong 203 to a specific configuration to the total number of extracted clusters. Configurations with the highest 204 probability of occurrence correspond to persistent metastable solvation structures in the solution. 205 By default, the framework selects the top configurations whose probabilities sum to more than 206 90% of the total number of extracted clusters, but the user may also select the configurations as 207 needed. The selection of the configurations is done to reduce the number of required DFT 208 calculations and their associated computational cost. It is also important to select a representative 209 cluster from each configuration since it is common that thousands of clusters with subtle 210 geometrical differences (e.g., bond lengths, orientation, etc.) belong to the same configuration. To 211 this end, the framework performs a local minimization procedure on all the clusters from the 212 selected top configurations using the MMFF94s force field 37 as implemented in the RDKit 213 library 38 . The framework then feeds the lowest-energy conformer of each configuration to an NMR 214 DFT workflow. We note that it would be infeasible to manually generate and categorize this large 215 number of structural files and account for all the possible solvation structures using conventional 216 methods that rely on chemical intuition. This task is especially challenging for chemical systems 217 that have not been previously explored in detail. 218 The NMR workflow relaxes the CMD clusters selected from the previous step, performs a 219 vibrational frequency analysis, and calculates the magnetic shielding tensor on each atom if a true 220 potential energy surface (PES) minimum is reached. The framework by default uses the 221 ωB97X 39 /def2-TZVP level of theory for performing these three sequential DFT steps. Switching 222 the functional, basis set, and other Gaussian input parameters (e.g., solvation model, numerical 223 and algorithmic parameters, etc.) is straightforward and requires the user to input them in the form 224 of a Python dictionary to the framework. The framework then performs an analysis step that stores 225 the calculation results in an NMR collection in the database or a local JSON file with all the 226 necessary metadata for future reference. Creating a local file allows the user to check outputs 227 quickly, retrieve data without accessing the database, and exchange data with other parties. An 228 example of the structure of an NMR document is shown in Fig S1. Finally, results from the 229 computational framework are compared to experimental NMR spectra to elucidate the solvation 230 structures. 231 In the NMR workflow, a series of convergence checks are performed to ensure the results are 232 as reliable as possible. For example, we implemented checks for normal termination of DFT 233 calculations and automatic inspection of the 3D structure resulting from optimization to confirm 234 connectivity matches the input structure. Once each step of the NMR workflow has terminated, 235 the output file is parsed for errors. An automatic error correction process is employed through 236 well-defined rules via the custodian package if an error is detected. If possible, the error handler 237 applies the appropriate remedy, generally by modifying the input parameters, writing a new 238 Gaussian input file, and restarting the calculation. If no remedy has been implemented for a 239 particular error or the error handler cannot interpret the encountered error, the calculation is 240 allowed to fail. The error handler improves the success rate of the calculations without relying on 241 human intervention, which would be impossible for handling large computational investigations. 242 Examples of the errors addressed are SCF failure, geometry optimization convergence, error in 243 internal coordinates, and exceeded wall time limit. 244 The framework takes solvent effects into account by two approaches. It uses an explicit 245 approach where several solvent molecules surrounding the species are correctly placed in its first 246 solvation shell since the geometries are extracted directly from CMD simulations. Second, it 247 approximates bulk solvent effects using a dielectric continuum model. This approach allows 248 incorporating a thermodynamically stable and realistic chemical environment of species compared 249 to the traditional approach, which relies on either implicit solvent models or manual prediction of 250 the possible solvation structures. Since multiple configurations are considered, collected data result 251 in various chemical shifts corresponding to different chemical environments experienced by the 252 nucleus of interest. Therefore, predictions from this approach can be compared and fitted to the 253 entire experimental NMR peak rather than just matching the peak center, especially when peak 254 and subsequently executed over computing resources. The workflow generated and managed over 264 600 input and output files and inserted more than 300 13 C and 600 1 H chemical shifts into the 265 database via a simple one-shot script. We compared our predictions to experimental data from the 266 SDBS 27 database and a previous study 28  Achieving this convergence for small molecules is relatively straightforward by combining DFT 279 or even coupled cluster calculations with large basis sets. However, this is much more challenging 280 with complexes consisting of multiple species. Therefore, there is a need to balance the cluster size 281 with the quality of the DFT level of theory. In addition, the choice of the implicit solvent model is 282 crucial for approximating the bulk solvent effect. A remarkable number of benchmarking studies 283 have been done on quantum mechanical methods for predicting properties in complex 284 multicomponent battery electrolytes similar to the test case here [41][42][43] . However, parallel studies for 285 NMR calculations for these types of systems are still in their infancy. Other factors include 286 selecting an appropriate number of molecules in the chemical reference to account for 287 intermolecular interactions and a representative conformer from each solvation environment. In 288 the following sections, we report the role of each of these factors using results obtained by the 289 framework for the Mg(TFSI)2/DME test case system. 290

Role of the force field 291
The choice of the force field parameters used in CMD simulations can significantly influence 292 the speciation observed in solution, and thus the NMR chemical shift predictions. Therefore, we 293 benchmark the most commonly used and reliable force fields for liquid solutions, including GAFF 294 (FF1), non-polarizable OPLS 44  The simulation density ( ) using the three force fields, shown in Table S2   Overall, we find significant differences in the type and distribution of these structures among the 318 tested force fields. For example, the most probable solvation structure predicted with FF1 involves 319 one DME molecule in bidentate configuration and four TFSIanions in monodentate configuration. 320 In addition, rather than forming a single stable solvate like in the case of FF3, the distribution of 321 coordination environments for the cation with FF1 is much more heterogeneous and involves 322 configurations dominated by the anion. With FF2, the electrostatic interaction with the anion is 323 slightly suppressed, and the most probable solvation shell is composed of two DME solvents and 90% and 30%, respectively (Fig S9). 338 The discrepancies in the predicted properties are not particularly a problem of a specific force 339 field or the Mg(TFSI)2/DME system, but rather due to a lack of accounting for the critical 340 interactions in the non-polarizable simulations. The predicted properties using FF3 are the most 341 consistent with previous experimental 23, 26 and computational 46 (Table 1) is also deemed to be in 376 satisfactory agreement with experimental data. Therefore, multiple Mg 2+ structures that are 377 entirely dissociated from the anion are possible in the solution. Excluding configuration 4, the 378 increase in the ion-dipole interaction between Mg 2+ and TFSIin the following order: configuration 379 1 < configuration 2 < configuration 3 < configuration 6 < configuration 5 leads to the observed 380 monotonic upfield shift in the corresponding 25 Mg chemical shift. The presence of loosely packed 381 clusters of [Mg(DME)n] (n ≤ 2), i.e., configuration 4, is attributed to the high degree of freedom 382 and structural flexibility of DME. This type of configuration has been reported to be favorable at 383 lower concentrations due to lower electrostriction (reduced solvent volume in the Mg 2+ solvation 384 shell relative to the bulk) and diminished entropy loss 23 . On the contrary, higher concentrations 385 (0.51 M) such as the one used in this study lead to closer distances between Mg 2+ ions, resulting 386 in stronger electrostatic interactions and dampened DME motion, thus favoring fully solvated 387 clusters (n = 3, configuration 1). This behavior is consistent with the low probability of 388 configuration 4 and the predicted 25 Mg chemical shift of this configuration, which is far from the 389 experimental peak center (Fig 3). 390  'free CH3') and DME coordinated to Mg 2+ (labeled 'bound CH3') from DFT predictions and 401 experimental measurements. Similar plots for 13 C shifts assigned to CH2 and 1 H shifts assigned to 402 CH3 and CH2 of both types of DME molecules are shown in Figs S11-S13, respectively. While 403 free and bound DME molecules are distinguishable from experimental and predicted 13 C and 1 H 404 NMR chemical shifts, it is impossible to differentiate between bound DME at different 405 configurations identified in this work. The spectroscopic differences between the structures may 406 be subtle (see, for example, Table 1 for 13 C and 1 H chemical shifts in different configurations). On 407 the contrary, 25 Mg chemical shifts can be utilized for this purpose, whereby changes in charge 408 density localization on different Mg 2+ complexes directly alter the screening effects experienced 409 by the 25 Mg nucleus, thus giving rise to different NMR responses. As displayed in Fig 4 and Fig  410   S11, the highest deviation from experimental 13 C shifts are obtained with the 6-31+G* and 6-411 311++G** basis sets combined with any tested density functional. The basis set from the 'def2' 412 family of Alrichs and coworkers 53 , particularly in combination with ωB97X, leads to 13 C NMR 413 chemical shift error that approaches the underlying uncertainty in experimental measurements 414 (Table 1). Fig S12 and S13 indicate that for 1 H chemical shifts, M06-2X/def2-TZVP outperforms 415 the other tested levels of theory with absolute errors between 0.01 and 0.3 ppm (Table 1). 416 Fig. 4 Strip plot of the computed and experimental 13 C NMR chemical shifts assigned to CH3 of DME coordinated to Mg 2+ (labeled Bound CH3) and CH3 of free DME (labeled Free CH3). For color code of 'Bound CH3', please refer to Fig 3. Results from each DFT functional are shown with the basis sets in the following order: 6-31+G*, 6-311++G**, and def2-TZVP We conclude that the choice of the basis set has the highest impact on the accuracy of NMR 417 chemical shift predictions. The 6-31+G* basis set is ruled out as a suitable basis set for NMR 418 calculations of complexes similar to those studied herein due to its degraded accuracy compared 419 to other basis sets, despite its lower computational cost (see Fig S14 for timings). For 25 Mg and 420 13 C chemical shifts, the ωB97X/def2-TZVP level of theory is recommended if computational 421 resources are available as its remarkable accuracy and the applicability of def2-TZVP to broader 422 chemical systems make it well worth the additional cost. If computational resources are limited, 423 M06-2X/6-311++G** is recommended for 25 Mg shifts as its cost is not prohibitive while still 424 predicting the correct Mg 2+ solvation structure. Finally, M06-2X with def2-TZVP or 6-311++G** 425 are recommended for 1 H chemical shift predictions. 426

Effect of geometry optimization 427
To examine the possibility of making DFT calculations more affordable, we calculated the 428 25 Mg chemical shift of 33 pre-relaxed clusters extracted from CMD simulations. We then compare 429 their deviation from calculations utilizing optimized geometries at the same level of theory (Fig  430   S15). We find a mean absolute deviation of ~ 37.6 ppm between the two types of calculations, 431 with a systematic downfield shift from calculations utilizing fully optimized structures. This result 432 is not surprising due to the sensitivity of the 25 Mg nucleus to subtle differences in the local structure 433 and coordination environment. Therefore, relaxing the structures ensures that 'reasonable 434 geometries' are used, and therefore is a prerequisite for obtaining accurate NMR chemical shifts 435 that are comparable to experimental measurements. inevitably exist in DME solution, calculations on (DME)n (n = 1 -4) are carried out for predicting 447 the 13 C and 1 H chemical shifts of DME molecules in the bulk solution, and (DME)2 is found to 448 result in bulk CH3 and CH2 chemical shifts that reproduce the experimental data. All calculated 449 isotropic shielding constants for H2O, DMSO, and DME clusters are included in the dataset 450 associated with this work. 451 Role of the implicit solvation model 452 In addition to the explicit solvent molecules modeled in the Mg 2+ first solvation shell, an 453 implicit model is used to incorporate long-range electrostatic effects. Implicit solvent models have 454 the advantage of reducing the number of degrees of freedom of the environment (solvent), thereby 455 decreasing the computational cost to describe the dielectric continuum outside the solute cavity. 456 SMD is reliable in many applications 54 and therefore is compared to the PCM results in this work. 457 As evident from the data in Table 1 conformers that would need to be considered. 500 results reported in this work and the previously reported SCXRD results 26 . The benchmark test 518 case shows that our procedure can generate reliable results that can facilitate NMR deconvolution 519 assignments to determine ionic association interactions within liquid solutions similar to those 520 reported in this work. An extension of this framework is under development and will be 521 successfully added to the existing one. Features that will be supported include the ability to explore 522 the role of the second solvation shell and coupling this strategy with a more detailed analysis of 523 the exchange dynamics in the solution. In addition, support for performing automated polarizable 524 CMD simulations using the thermalized Drude dipole method as implemented in LAMMPS will 525 be added. The current and extended framework will be used to study other monovalent and 526 multivalent electrolytes whose structure is not intuitive or when the chemical and parameter spaces 527 are too large for human search using conventional non-automated methods. Data collected from 528 the framework is expected to provide fingerprints to guide future experimental investigations of 529 liquid solutions with optimal properties. 530

METHODS 531
Computational 532 CMD simulations are performed using the LAMMPS simulation package 34 version 3Mar2020 533 (http://lammps.sandia.gov). Initial configurations of ions in the solvent are first obtained by 534 randomly packing the molecules in a cubic box of size 5 ⨉ 5 ⨉ 5 nm 3 with periodicity in XYZ 535 directions using the PACKMOL package 56 . We consider MgTFSI2 in DME at a salt-to-solvent 536 ratio of 1:18. In FF1, i.e., GAFF 36 parameterization, TFSIand DME parameters are obtained by 537 first generating the electrostatic potential of single molecules in Gaussian 16 33  which are based on GAFF parameterization 36 . Lastly, based on FF2, we build FF3 544 parameterization that includes polarization effects via the classical Drude oscillators model 45,62 . 545 Drude particles are attached to all atoms, excluding hydrogen and Mg 2+ due to their relatively small 546 polarizabilities. Atomic polarizabilities and charges for TFSI are based on the APPLE&P force 547 field 63 , whereas those for DME are taken from work on poly(ethylene oxide) 64 . Nonbonded 548 parameters for Mg 2+ cations are adapted from AMOEBA-PRO-13-FF 65 . Force field parameters 549 used in this work are listed in Tables S3-S7.  550 Lennard Jones interactions are truncated at a cutoff distance of 1.2 nm, and the particle-particle 551 particle-mesh (PPPM) 66 method is used to handle long-range electrostatic interactions. With FF3, 552 a Thole damping factor 67 of 1.0 is used to smear the neighboring induced dipoles located on the 553 same molecule and prevent the 'polarization catastrophe' 68 . Initial structures are subjected to a two-554 step energy minimization, first using the steepest descent algorithm employing convergence 555 criteria of 1,000 kcal/mol Å and then using a conjugated-gradient minimization scheme with an 556 energy convergence criteria of 10 kcal/mol Å. The two-step minimization allows for the release of 557 strained contacts in the initial configuration. Isothermal-isobaric simulations (NPT) are performed 558 to obtain the correct density on the minimized system using a Nosé/Hoover temperature thermostat 559 and pressure barostat to maintain the temperature at 298.15 K and the pressure at 1 atm for 2 ns. 560 With FF3, Drude particles are thermalized at a lower temperature relative to Drude cores to avoid 561 fast vibrations of the small reduced masses, thus allowing the use of a reasonable time step. The 562 system is then melted to 500.15 K for 2 ns and subsequently quenched to 298.15 K for 3 ns. 563 Following that, canonical ensemble (NVT) simulations are performed for 50 ns using a time step 564 of 0.001 ps at 298.15 K to equilibrate the system. Molecular trajectories are sampled every 5 ps, 565 resulting in 10,000 snapshots, from which properties of interest are calculated. 566 All DFT calculations are performed using Gaussian 16 33 . Magnetic shieldings are calculated 567 for the extracted Mg 2+ clusters, ranging in size from 33 to 78 atoms. The benchmark study is 568 performed with twelve combinations of functionals and basis sets (Fig 4) chosen due to their broad 569 application in the NMR literature. An ultrafine integration grid is employed, and van der Waals 570 interactions are treated using Grimme dispersion correction (D3) 69  are the chemical shift and the isotropic shielding constant of the nucleus of interest in 577 a given cluster, respectively, and is the calculated isotropic shielding constant of the same 578 nucleus in a suitable reference compound. We use an Mg 2+ ion coordinated octahedrally by six 579 water molecules, dimethyl sulfoxide, and water, as the chemical references for 25 Mg, 13 C, and 1 H, 580 respectively. To reduce systematic errors, we use secondary references (TMS) by adding 39.5 and 581 4.7 ppm to the calculated chemical shifts of carbon and proton, respectively. These values 582 correspond to the experimental chemical shifts of the secondary references relative to the primary 583 standards. We again stress that all the steps described here are automated within our computational 584 framework except for the polarizable CMD simulations. 585

Experimental 586
Mg(TFSI)2 (99.5%, Solvionic) were dried for 48 hours under vacuum at 180 C, and the DME 587 solvent (Battery-grade, Gotion) was further dried over activated 3 Å molecular sieves in a 588 glovebox until its water content was determined to be below 10 ppm using a Karl-Fisher Titrator 589 (Metrohm). Mg(TFSI)2/DME solutions were prepared inside a glovebox filled with nitrogen right 590 before NMR measurements. 1 H and 13 C NMR measurements were performed on a Varian DDRS 591 spectrometer with a 17.6 T magnet using a broad-band (BBO) probe with 1 H and 13 C Larmor 592 frequencies of 748.1 and 188.1 MHz, respectively. The 90˚ pulse widths were 16 µs for 1 H and 16 593 µs for 13 C. 1 H spectra were collected using 30° pulses with a transition number of 16 and a recycle 594 delay of 20 s with a coaxial tube holding Mg(TFSI)2/DME solution and an outer NMR tube holding 595 D2O (99.9%, from Sigma Aldrich) as an external reference at 4.77 ppm. 13 C spectra were collected 596 using 30° pulses with averaging of 1024 transients and a recycle delay of 12 s using a thin-wall 5 597 mm NMR tube. 25 Mg NMR spectra were collected at a 14.1 T magnet (Varian DDR spectrometer) 598 with a 25 Mg Larmor frequency of 36.7 MHz and a 90˚ pulse width of 20 µs. A small tip angle of 599 15° with a recycle delay of 0.1 s was used and 128,000 transients were acquired. In order to 600 minimize the spectrometer drift effect on chemical shift, DMSO-d6 and 5 M MgCl2 were used to 601 reference 13 C (39.52 ppm) and 25 Mg (0 ppm), respectively, right before each NMR measurement. 602

DATA AVAILABILITY 603
The dataset used to generate the results in this work along with the optimized 3D structures in 604 XYZ format are available in the repository at https://github.com/rashatwi/nmr-dataset. 605

CODE AVAILABILITY 606
The open-source LAMMPS-code is used in the CMD simulations while the proprietary Gaussian-607 code is primarily used in the DFT calculations. The framework shown in Fig 1 is implemented 608 using the MISPR infrastructure, which defines, executes, manages, and stores DFT and CMD 609 workflows. The codes used in this work will be made publicly available with the future release of 610 the MISPR package. 611

ACKNOWLEDGMENTS 612
The authors thank Xiaohui Qu (Brookhaven National Laboratory) for the helpful discussions.