A New Approach for In Silico Prediction of Oral Drug Concentration Pro�les in Human for Drug Candidates Lack Experimental Pharmacokinetic Data

The purpose of this study is to develop a novel protocol to preliminarily predict the concentration pro�les of a target drug based on the PBPK model of a structurally similar template drug by combining two software for PBPK modeling, the SimCYP simulator and ADMET Predictor. The method was evaluated by utilizing 13 drug pairs which come from 18 drugs in the built-in database of the SimCYP software. All drug pairs have their Tanimoto scores no less than 0.5. As each drug in a drug pair can serve both the target and template roles, in total, there are 26 sets studied in this work. Three versions (V1, V2 and V3) of models for the target drug were constructed by replacing the corresponding parameters of the template drug step by step with those predicted by ADME Predictor for the target drug. Normalized RMSE (NRMSE) were introduced for the evaluation of the model performance. We found that the performance of the three versions of models depends on structure similarity of the drug pairs. For Group I drug pairs (TS ≤ 0.7), V2 and V3 perform better than V1 in term of NRMSE; for Group II drug pairs (0.7 < TS ≤ 0.9), 8 out of 10 V3 models have NRMSE lower than 0.2, the cutoff we applied to judge if the simulated C-T curve is satisfactory or not. Obviously, V3 outperforms the V1 and V2 versions. For the two drug pairs belong to Group III (TS > 0.9), V2 outperforms V1 and V3, suggesting more unnecessary replacement can lower the performance of PBPK models. We also investigated how the prediction accuracy of ADMET Predictor as well as its collaboration with SimCYP in�uence the quality of PBPK models constructed using SimCYP. In conclusion, we generated a practical guidance on applying two mainstream software packages, ADMET Predictor and SimCYP, to construct PBPK models for drugs or drug candidates which lack of ADME parameters in model construction.


Introduction
Pharmacokinetics is the study of the time courses of a drug administered to the body, which includes the processes of absorption, distribution, metabolism and excretion (ADME). [1] Usually, it is essential to quantitatively measure the concentration of the drug in plasma at different time points in pharmacokinetic (PK) study, for the analysis of drug behavior and dose adjustment. In addition to clinical trials which always involved in time cost and ethical considerations, the "measurement" of concentration pro les under various administration conditions can also be achieved by the implementation of physiologically based pharmacokinetic (PBPK) [2][3][4] modeling with known PK parameters related to drug properties or its ADME pro les. On the other hand, computational tools for both PBPK modeling and PK parameter prediction have been developed, further reducing experimental cost. Therefore, by virtue of such tools, the quick and convenient in silico prediction of drug behavior in human body can be easily performed without investing much effort in experiments, informing further studies in drug toxicity, dosing strategy and potential drug-drug interactions. As such, this in silico method can be particularly useful in preclinical study and can serve as a tool to help select drug candidates which are more likely to have desirable PK pro les.
One study predicted the Fa% of a structurally diverse group of drugs using theoretical descriptors and neural network modeling. [5] Another study applied genetic algorithm to optimize the prediction model for drug Fa%, plasma protein binding and urinary excretion. [6] There are also studies to predict the Fa% of a chemical series with GastroPlus. [7,8] Evaluation of the Fa% prediction performances from different software platforms, SimCYP and GastroPlus has also be conducted focusing on low-solubility drugs. [9] However, among these studies, the value of Fa% and AUC value have collected much attention as one of the most important parameters for drugs after administration. However, these parameters cannot fully exhibit the shape of the drug C-T pro le. How a drug is absorbed, distributed, metabolized and excreted in the course of time still lacks a systemic prediction guidance.
In this study, we developed a novel method to predict the concentration pro le of a target compound based on PBPK models constructed using the model of a structurally similar drug which serves as the template. We utilized the SimCYP simulator (V19, Release 1; Shefeld, UK) [10] software to construct PBPK models for a target drug by only substituting the predicted ADME parameters of the target drug for those applied by the PBPK model of the corresponding template drug. We applied ADMET Predictor (V9.5, Simulation Plus) , [11,12] a software developed by SimulationPlus Inc. to predict the ADME properties of target drugs, which include physiochemical parameters like fraction unbound in plasma (f u ) and blood-toplasma partition ratio (B/P), and ADME input parameters such as volume of distribution (V d ), Michaelis-Menten constant (K m ) and maximal metabolism rate (V max ) of common enzymes. Meanwhile, to better validate our constructed PBPK models as well as evaluate the performance of the two software tools, we selected 18 drugs collected by SimCYP compound library (including substrates and inhibitors) as the template drugs. In total, 13 drug pairs were formed based on their structural similarity. For each pair of drugs, one serves as the template and the other as target drug. For the target drug in a drug pair, we pretended that no PBPK model was available for it and new PBPK models were constructed based on the PBPK model of the template drug. We tested three protocols by introducing ADME Predictor predicted ADME properties into the template PBPK model and evaluated the model performance using the observed PK pro le of the target drug. The corresponding PBPK models constructed using the three protocols, in brief were called V1, V2 and V3 models, respectively.

Methods
Drug preparation. Drugs selected for the construction of in silico PBPK models come from the built-in drug database of the SimCYP software. Simpli ed Molecular-Input Line-Entry System (SMILES) [13] strings of all drugs from SimCYP built-in library, including substrates and inhibitors, were collected from the DrugBank database (https://www.drugbank.ca/). The SMILES strings of drugs were used not only for their structural similarity calculation on a web platform, but also as inputs for the generation of their properties using ADMET Predictor.
Structure similarity calculation. Tanimoto scoring is a commonly used method to compute the ngerprintbased similarity between two compounds. [14] In this study, we applied the maximum common substructure based (MCS) Tanimoto algorithm for the similarity calculation. The Tanimoto score (TS) is de ned by the function below: [15] Eq. 1 Where N X and N Y are the numbers of bits in fragment bit-strings of the two compounds, and N Z is the intersection set, i.e., the number of common substructures shared by these two compounds. TS (X, Y) ranges from 0 to 1, measuring the structural similarity between two compounds from the lowest to the highest (when the two molecules are identical). TS scores were calculated using ChemMine (https://chemminetools.ucr.edu/similarity) for all combinations of drugs in the SimCYP compound database.
Validation of PBPK models for drug templates. We rst validated the PBPK models of all selected 18 drugs by utilizing their observed data from literature. In detail, we utilized the original built-in models of those drugs in SimCYP to run the simulation. In terms of the trial design, the dose regimens, simulation time as well as population information including age, weight and health condition were the same as those reported in the clinical study of PK measurement. Meanwhile, the parameters of the built-in PBPK model, like the drug's ADME properties, remained the same for all the drugs except for Fluoxetine. As a racemate, we adjusted some of its ADME and PK parameters according to the literature to make the predicted curve much better tting the experimental data. [16][17][18] The key ADME parameters predicted by ADME Predictor for the 18 drugs were all listed in the Table S1, including the detail for the adjusted parameters of Fluoxetine. The observed drug concentration data of each template drug was extracted from published concentration-time (C-T) curves using WebPlotDigitizer (https://automeris.io/WebPlotDigitizer/). The C-T curves from simulations were then overlaid to the observed drug concentrations. The predicted PK pro les of each template drug, including the maximal concentration (C Max ), the time at which C Max is observed (T Max ), and area under the curve (AUC), were compared to the observed ones.
Evaluation of inherent differences among software platforms. The quality of models constructed for target drugs not only affected by the structural similarity between the template drug and the target drug, but also relies on the prediction quality of ADMET Predictor and how well the collaboration is between the two software. There may be some inherent differences among different software platforms, including but not limited to the training set data and algorithms for constructing models. More importantly, the prediction accuracy of ADMET Predictor for an individual ADME parameter is unknown. Thus, we utilized parameters predicted by ADMET Predictor for the 18 drugs to simulate their PK pro les using SimCYP and then compared to those predicted using SimCYP built-in parameters. The following ADME parameters predicted by ADMET Predictor were evaluated: B/P, F u , the logarithm of octanol-buffer partition coe cient (log P o:w ), acid dissociation constant (pK a ), human jejunum effective permeability (P eff ), V d , and Cytochrome P450 (CYP) metabolism parameters (K m , V max or CL int ). The values of these ADME parameters for 18 drugs are listed in Table S1. Following the protocol called Version 2 (V2), we replaced log P o:w , pK a ,B/P and F u value in SimCYP drug template with the calculated results from ADMET Predictor. Following the protocol named Version 3 (V3), all the above mentioned ADME parameters, which not only includes the parameters mentioned by V2, but also P eff in absorption, V d in distribution and CYP metabolism parameters of template drug were replaced by predicted values of ADMET Predictor.
Model construction for target drugs. In total, three versions of PBPK models for a target drug were built by modifying the models of the template drug: (1) in Version 1 (V1), only the molecular weight (MW) of template drug was changed to that of the target one; (2) in Version 2, in addition to the MW, other parameters of template drug, which are the same with above mentioned Version 2, were replaced by the ones predicted of the target drug; (3) in Version 3, in addition to MW and physiochemical properties, P eff , V d , and CYP of templates were also replaced with the calculated ones for the target drug, which is accordance with above mentioned Version 3. All the ADME properties of the target drugs are predicted by ADMET Predictor, a software tool that can predict over 140 properties based on its built-in Quantitative structure-activity relationship (QSPR) models. [19] Information about the experimental subjects and the trial design of each target drug during simulations was derived from the corresponding clinical reports.
Evaluation of models for target drugs. To evaluate the performance of PBPK models with input parameters from ADMET Predictor, the experimental data of target drugs were overlaid by the simulated C-T curves. To quantitively evaluate how well the experimental and simulated curves overlaid with each other, we calculated the root mean square error (RMSE) [20] of the observed and predicted concentrations at different time points. The formula for the RMSE calculation is as follow: Eq. 2 Where C oi and C pi represent the observed and predicted drug concentration at the time point i. N is the number of time points (N > 1) from the extracted observed data. Speci cally, in this study, to facilitate the comparison between models for different drugs with various concentration scales, we introduced normalized RMSE (NRMSE) to evaluate the performance of PBPK models, which is calculated using the following formula: Where C max and C min are the maximum and minimum values among the observed and predicted concentrations using all three versions of models.
The owchart of the experiment protocol is shown in Fig. 1.

Results
Drug pairs selection and validation of PBPK models for drug templates. 13 pairs out of 18 drugs, which have the calculated TS equal to or better than 0.5, were selected for the in silico PBPK modeling. Drug pairs with TS below 0.5 were not considered to be structurally similar and were excluded in this study. The calculated TS for selected 13 pairs (Groups A-M) were listed in Table 1. Since both drugs in a pair will in turn serve as the template and target drug for cross validation, we used X-1 and X-2 to label two drugs in the pair, respectively, where X can be A to M.  Table 2, excluding the drugs with observed PK parameters all unavailable (Dextromethorphan, Mephenytoin and Fluoxetine), the predicted PK parameters of most drugs are within the standard deviation ranges of their observed values. The predicted values of C Max , T Max and AUC for Theophyline are all slightly beyond the margin of error but still within the range of twofold standard deviation. Overall, as shown in Fig. 2, the observed C-T pro les are within the 95% Con dence Interval (CI) ranges (the upper and lower grey dashed curves) of the simulated C-T curves. Therefore, the PBPK models for the template drugs have been well validated. Pred: Predicted drug PK parameters from the unchanged SimCYP drug template (except Fluoxetine). Pred_V2: Predicted drug PK parameters using SimCYP with input parameters (log P o:w , pK a ,B/P and F u ) from ADMET Predictor. Pred_V3: Predicted drug PK parameters using SimCYP with input parameters (log P o:w , pK a ,B/P, F u , P eff , V d , and CYP parameters) from ADMET Predictor. Obs: drug PK parameter reported by clinical research. For Fluoxetine especially, the SimCYP drug template is modi ed to enable the predicted pro le t the clinically reported curve.
Evaluation of Inherent differences among software platforms. The predicted PK parameters of the 18 modi ed drug templates by replacing the ADME parameters with those predicted by ADMET predictor are listed in Table 2. The C-T pro les of those 18 drugs are shown in Fig. 3 (V2) and Fig. 4 (V3). In V2, most drugs exhibit satisfying prediction results. As is shown in Fig. 4, 14 out of 18 drugs have most part of their experimental data point lay within the predicted con dence interval. Only Triazolam, Atomoxetine, Simvastatin and Pravastatin have nearly or more than half of the data points exceed the con dence interval, showing poor prediction performance. In V3, it is demonstrated that Bupropion, Caffeine, and Phenobarbital show a very good overlay between the clinical report and predicted result from modi ed drug template, with the observed data laying within the con dence interval of predicted curve. As to Fluoxetine, Alprazolam, Quinidine, and Triazolam, although the predicted results do not show an excellent overlay with the experimental data, most of the clinical data points lays within the con dence interval of the prediction pro les. For Lorazepam, although the observed data all at or around the upper con dence interval of the predicted pro le, the shape of the predict curve shares a high similarity with that of the observed PK pro le. Unfortunately, the other drugs do not show very satisfying prediction results, using clinical data points as reference.
To quantitatively measure the deviation of predicted concentration pro les from the experimental data, the difference between observed and predicted values of evaluated by NRMSE (  Fig. 3. Interestingly, the NRMSE values of Fluoxetine (0.41), Alprazolam (0.28), Quinidine (0.53), and Triazolam (0.29) are quite different, even though the simulated C-T curves of the four drugs are relatively satisfactory. Taken together, both the overlay of simulated C-T curves with the measured C-T data points and NRMSE should be used to evaluate the quality of the predicted ADME parameters by ADMET predictor. Overall, the predicted ADME parameters by ADMET Predictor can produce satisfactory C-T curves using SimCYP simulator for about half of the tested drugs. Predicted concentration pro les for the in silico PBPK models. The C-T pro les predicted by all three versions (Versions 1, 2, and 3) of PBPK models are shown in Fig. 5. The NRMSE value is also calculated to measure the difference between observed and predicted values of three versions respectively and summarized in Table 4. The table cell is marked with "*" if the NRMSE values of V1, V2, or V3 is lower than 0.2. In the following, we grouped all the 13 drug pairs / 26 drug pair sets into three groups according to their Tanimoto scores for the sake of discussion. Table 4 Calculated NRMSE between predicted (three versions) and experimental concentration pro les of drugs in each drug pair set. The NRMSEs of the target and template drugs, which are adopted from Table 3, measure the quality of the ADME prediction using ADMET Predictor and/or the inherent difference between the two software. The Tanimoto scores in the last column come from  Table 4, the performance of the three protocols does not show an obvious pattern for Group I. The V1, V2 and V3 have two (A-1 and D-1), ve (A-1, A-2, D-1, D-2 and F-1) and three (B-2, C-2 and D-2) pair sets in "*" table cells, respectively. Most of those pair sets also exhibit a good overlay between experimental data points and prediction curves as shown in Fig. 5, indicating the collaboration between SimCYP and ADMET Predictor is good. For the other groups from A-1 to F-2, all the three protocols have NRMSE values larger than 0.2 and the simulated C-T curves do not overlay with the experimental data points well. Interestingly, for D-2 drug pair set, though the NRMSE of the V2 model is the lowest, the predicted C-T curve by the V3 model has a better shape tting the observed data as shown in Fig. 5. This phenomenon is caused by the deviation of the rst data point from the predicted curve of V3, which caused its NRMSE is larger than that of V2. When this outlier is eliminated and the NRMSE value is recalculated, V3 become the best for this pair set (NRMSE are now 0.57, 0.16 and 0.06 for the V1, V2 and V3 protocols, respectively).
Group II (0.7 < TS ≤ 0.9). This group contains 5 drug pairs, G-K. As shown in Table 4, most drug pair sets have at least one version with NRMSE value lower than 0.2, except H-1 and I-2. It is worthy to mention that the NRMSE value of I-2 is only 0.21 and the predicted C-T curve exhibits a good consistency with experimental data (Fig. 5). The failure of H-1 model is likely caused by using problematic ADME parameters predicted by ADMET Predictor for the target drug. The "collaboration" between the two software should not be a problem for this drug pair since the NRMSE values of H-2 are very low for both the V2 and V3 models, which are 0.08 and 0.02 for the two models correspondingly. As shown in Table 4, the V3 version models apparently outperform the V1 and V2 models for most drug pair sets, as 7 out of 10 V3 models have NRMSE values lower than 0.2, while none of V1 models and 2 V2 models have their NRMSE values lower than 0.2. Interestingly, for drug pair set J-2, the V2 and V3 models have highly similar performance with good prediction result as shown in Fig. 5; however, for K-2, all of the three model versions do not exhibit satisfying prediction (Fig. 5), even though the NRMSE values of the V1 and V2 models are equal to or lower than the cutoff.
Group III (TS > 0.9). This group contains 2 drug pairs, L and M. As shown in Table 4, most models have satisfactory NRMSE values. For L-1 and L-2 drug pair sets, the predicted pro les of the V2 and V3 models are very close to the clinical data points. Interestingly, for M-1 and M-2 drug pair sets, the performance of the V3 models is very poor. Drug pair M has the structural similarity with the Taminoto score of 0.95, interestingly, the V3 models perform poorly while the V1 and V2 models have not only satisfactory NRMSE values, but also very well-overlayed C-T curves with measured data points. This phenomenon may be explained by the prediction error by ADMET Predictor and error caused by the inherent difference between the two software can be compensated by the small difference of the ADME parameters between the template and target drugs. Indeed, the NRMSE values of the two drugs in drug pair M, 0.51 and 0.70, are very large (Table 4).

Discussion
In this study, we developed a novel approach to construct in silico PBPK models for target drugs lack of experimental ADME and other PK parameters using an established PBPK model of a structurally similar drug as the model template. We used 18 drugs which formed 13 drug pairs (A-M) and 26 drug pair set (each drug in a pair serves the template and target roles alternatively) to evaluate three ADME parameter substitution protocols, which are corresponding to three versions of PBPK models. The performance of the in silico PBPK models were critically evaluated using experimental PK pro les and parameters.
The practical guidance on selecting suitable drug template. We attempted to obtain guidance on selecting a suitable template drug for a given target drug. We focused on using structural similarity to select the template drugs. It is found that drug pairs with Tanimoto score higher than 0.70 (Groups II and III) tend to show better predict performance among the three versions compared with drug pairs that with Taminoto score lower than 0.70 (Group I). It is obvious that the higher structural similarity of two drugs within a drug pair should contribute to the higher possibility of good prediction results. After comparing the model performance of all three versions of models, we developed the following guidance: for Group I drug pairs, V2 or V3 is recommended; for Group II drug pairs, V3 is recommended; and for Group III drug pairs, V2 is recommended. Following this practical guidance, 16 out of 26 drug pair sets have NRMSE values lower than 0.2, the threshold of recognizing a good PBPK model. Nevertheless, the prediction accuracy of ADMET Predictor and how much inherent difference between it and SimCYP are also very crucial factors that affect the model performance. From the evaluation of the error caused by combining the two software, the prediction accuracy of each modi ed drug template varied from each other, which shows the in uence of the introduced error can have great difference for different drugs. Thus, the selection of substitution strategy should consider the NRMSE values of both template and target drugs. Unfortunately, in practice, only NRMSE of the template drug is known. An algorithm which can predict the NRMSE value of an arbitrary compound is therefore needed to further improve the practical guidance.
Another possible method to evaluate the prediction results of the three versions. There is also another method to evaluate the prediction results of V1, V2 and V3, which is the fold-error in the AUC of the three predict versions compared to the clinical data. However, the fold-error in the AUC can only show the difference between the total area under the prediction curve and the literature reported PK curve without delineating the concrete shapes of curves. On the contrary, the shape of the predicted drug C-T curve can be re ected by the difference between predicted and observed drug concentration at each time point when using RMSE as an evaluation method. Furthermore, the variation of the dosages can contribute to great RMSE discrepancy among drugs. For this reason, we normalized RMSE to eliminate the in uence of dosages on RMSE value. The utilization of NRMSE can help to reduce the false-positive rate.
The perspective of applying in silico PBPK modeling for compounds lack experimental ADME and PK properties. SimCYP simulator is an advanced software with well-constructed drug PK models in its builtin drug library, with each drug template containing comprehensive drug parameters. It can intuitively show simulated drug C-T curves contributed by these parameters under different trial designs. On the other hand, ADMET Predictor can predict a lot of PK parameters of an input compound based on its structural information without giving additional information. However, constructing a drug PK model needs full-scaled PK parameters and some of them cannot be predicted reasonably. Out of this consideration, we can partially rely on the PK parameters of another compound which shares high structural similarity with the unknown target compound. In this study, we put forward a novel approach to build PBPK models for a target drug which is in lack of measured ADME and other PK parameters using the PBPK model of a template drug which is structurally similar to the target drug. Also, we proposed overall guidance on selecting a suitable template drug and using its PBPK model as the model template.
The success of this computational approach depends on two important factors, the availability of high quality PBPK model for the template compound and the accuracy and consistency of the ADME and PK parameters predicted by ADME Predictor software for the target drug. Thus, the performance of two software can greatly contribute to the experiment results of our study. As a calculator of ADMET properties for compounds, the prediction results of drug properties may not be close enough to the real state, leading to errors when constructing drug models. Additionally, not all the ADME/PK properties can be calculated with the current version of ADME Predictor. For example, the prediction of metabolism in ADMET Predictor is only limited to 5 commonly used enzymes (CYP1A2, CYP2D6, CYP2C9, CYP2C19 and CYP3A4), and the prediction results of the transporters related to the drug can only be reported qualitatively rather than quantitively. On the other hand, there are currently 70 established compounds in SimCYP's drug libraries (including both the substrate library and the inhibitor library) and the libraries are still under development. We tested 18 compounds which shared structural similarity, and this study will replenish as more clinically validated PBPK models or related parameters for in-use drugs are available.
Nevertheless, we have proposed a practical approach to generate PBPK models for a compound lack of experimental ADME/PK properties. This model can serve as the initial version of the PBPK models for the target compound, and its performance can be improved using the measured PK pro les and properties in the future. The computational protocol introduced in this work may have great applications in selecting drug leads to enter the drug optimization phase or drug candidates to enter preclinical studies.

Conclusions
In this work, we have introduced and tested a novel computational protocol to develop in silico PBPK model for a compound lack of measured ADME/PK properties and PK pro les. The general idea is to choose a proper PBPK model as the template, when the corresponding compound, the template drug, is structurally similar to the target drug. For the target drug, we calculated the ADME properties using ADME Predictor of SimulationPlus Inc. We have come out with an overall guidance using this method to build PBPK models for an arbitrary drug. First of all, the structural similarity between the template and the target drug is very important, thus template drugs which have highest structural similarity to the target drug should be rst considered; second, once the target drug is selected, the ADME parameter substitution protocol is selected based on the Tanimoto score (TS) between the target and template drugs. If TS is equal to or smaller than 0.7, V2 or V3 protocol is recommended; if TS is larger than 0.7 but smaller than or equal to 0.9, V3 protocol is suggested; and if TS is larger than 0.9, V2 is recommended. Following this guidance, more than 60% (16 out of 26) of the PBPK models have satisfactory performance. It is emphasized that this method highly relies on the collaboration between SimCYP and ADMET Predictor as well as the prediction accuracy of ADMET Predictor. The NRMSE values of the template and target drugs can guide us to select proper substitution protocol. If the NRMSE values are small, one can select a protocol with many ADME parameters being substituted, such as V3; however, if the NRMSE values are large, adopting V2 or V1 protocols can minimize the error due to the poor "collaboration" between the two software. Unfortunately, the NRMSE value of the target drug is unknown in practice. A tool which can predict this NRMSE parameter is thus needed to further improve this method. While future experimental work is de nitely needed to further improve the model performance, our novel approach proposed in this work can help identify drug candidates with favorable PK pro les, reducing experimental cost and providing insight in drug discovery and development. The owchart of experiment protocol.