Trends in PhysChem Properties of Newly Approved Drugs over the Last Six Years; Predicting Solubility of Drugs Approved in 2021

For the ‘small-molecule’ NMEs (new molecular entities) approved in 2021 by the US FDA, quantitative solubility values were found for 28 drugs, nearly all from published New Drug Applications (NDAs). Comparisons of physicochemical properties over the last six years indicate that the NMEs are slowly continuing to increase in size and decrease in solubility. Since 2016, the intrinsic solubility values (S0) have decreased on average by 0.50 log10 unit, the calculated octanol–water partition coefficients (clogP) have increased by 0.34 log10 unit, and the molecular weights (MW) have increased by 22 g·mol−1 (to 477, compared to 298 in older drugs). The average number of H-bond acceptors has remained constant, while the average number of H-bond donors and the Kier Φ molecular flexibility indices have decreased slightly. The reported solubility data for the 2021 small-molecule NMEs were processed using the program pDISOL-X to obtain S0 values, normalized to 25 °C. The S0 values ranged from 2 ng·mL−1 (avacopan) to 43 mg·mL−1 (viloxazine). In the new set, MW spanned from 233 g·mol−1 (dexmethylphenidate) to 1215 g·mol−1 (voclosporin). Values of clogP ranged from − 0.3 (serdexmethylphenidate, a quaternary ammonium molecule) to 8.1 (avacopan). Five different in-silico models were used to predict the aqueous intrinsic log10 solubility of the 28 novel NMEs: (i) Yalkowsky’s General Solubility Equation (GSE(classic)), (ii) Abraham’s Linear Solvation Equation (ABSOLV), (iii) Avdeef–Kansy ‘Flexible-Acceptor’ General Solubility Equation ((GSE(Φ,B)), (iv) Breiman’s Random Forest Regression (RFR), and (v) consensus model based on (ii) and (iii) above. The various models were retrained with an enlarged version of the Wiki-pS0 database (currently at 7655 log10 S0 entries of drug-relevant molecules). The consensus model (r2 = 0.67, RMSE = 1.08) just slightly outperformed the other four models. The relatively simple consensus prediction equation can be easily incorporated into spreadsheet calculations. As new drugs are approved, it will be important to continue monitoring the quality of measured solubility. Matching prediction to measurement is valuable when prediction methods are applied to virtual libraries, in order to seek opportunities to minimize pharmacokinetic risks of large, but otherwise promising, candidate molecules.

Keywords Flexible-acceptor general solubility equation · Abraham solvation equation · Kier molecular flexibility index · Intrinsic solubility · Partial least squares · Random forest regression Abbreviations S 0 Aqueous intrinsic solubility (i.e., the solubility of the uncharged form of the API) S w Solubility of the pure API (active pharmaceutical ingredient) in pure water n Number of measurements of log 10 S 0 in the training/test set MPP The measure of prediction performance [77] refers to the percent of 'correct' predictions, as defined by the count of absolute residuals |log 10 S 0 obslog 10 S 0 calc |≤ 0.5 divided by n. MPP is represented as a pie chart in the correlation plots (Fig. 5) RMSE Root-mean-square error, accounting for bias in the prediction of external test set solubility values: RMSE = [1/(n − 1) Σ i (y i obsy i calc ) 2 ] 1/2 , where y = log 10 S 0 r 2 Coefficient of determination, accounting for bias in prediction of external test set solubility values [79]: r 2 = 1 − Σ i (y i obsy i calc ) 2 /Σ i (y i obs -< y >) 2 , where y = log 10 S 0 , and < y > is the mean value of observed log 10 S 0 bias Intercept in the regression fit: y obs = a + b y calc , where the slope factor is fixed at unity SD Standard deviation: SD = [1/n Σ i (y i obs − <y>) 2 ] 1/2 , where <y> = mean value of log 10 S 0

Introduction
The downward trend in the pharma R&D productivity from 1996 to 2011, as indicated by counting the number of new molecular entities (NMEs) approved each year, started to reverse after 2012. Although approvals of biologics have been steadily increasing, most new market introductions are still the so-called 'small molecules' [1]. In the last two decades of considerations of Lipinski's 'Rule of 5' (Ro5) drug chemical space [2], many emerging NMEs are larger, less soluble, more lipophilic and possess more H-bond acceptors, when compared to older drugs [3]. NMEs outside of Ro5 space are dubbed 'beyond the Rule of 5' (bRo5) drugs [3][4][5][6][7][8].
In 2021, 50 drugs were approved by the FDA. Of these drugs, 72% are considered 'small molecule' NMEs. But these 'small molecules' are trending to larger sizes. Such large molecules may be burdened with pharmacokinetic (PK) risks, because of poor solubility or low cell permeability, elevated cellular efflux, and increased metabolism. In drug discovery and early development, several strategies to mitigate some of the risks have been tried [5][6][7][8][9]. Flexible molecules with the capability to form intramolecular H-bonds (IMHBs) have been of particular interest, since these may increase drug solubility in water (e.g., by adopting hydrophilic 'extended' conformations) and enhance permeability across bio-membranes (e.g., by adopting hydrophobic 'folded' conformations) [7][8][9].
Important factors for the productivity gain since 2012 include improved methods, focusing both on biological activity, as well as on absorption-distribution-metabolism-excretion-toxicity (ADMET) characteristics. Given the large number of molecules considered in discovery projects, the in-silico prediction of molecular properties is a valuable first step in prioritizing molecules for further (resource-costly) in-vitro and in-vivo screening.
Current trends in ADMET in-silico modeling approaches place increasing emphasis on calculated input parameters. Physicochemical properties are still key as inputs (e.g., octanol-water partition coefficients, log 10 P, ionization constants, pK a , etc.), but often these are calculated rather than measured values. The risk may be that overall simulation approaches might have substantial aggregate uncertainty, something that is not usually discussed.
Modeling approaches need reliable input parameters (descriptors) to validate and to further improve the quality and support necessary to shift from more simple simulation of in-vivo profiles to challenging real blind prediction of in-vivo ADMET and PK/PD (pharmacokinetics/pharmacodynamics) profiles. Thus, early optimization and selection of suitable biologically active candidate molecules could be improved. Reduced or eliminated animal testing would make pharma R&D even more successful. With a huge number of input descriptors one can get an impressive fitting of in-vivo profiles. Questionable trends are evident, that although multifactorial fitting to in-vivo profiles appear satisfactory, they often lack knowledge gain and possess limited predictive power. Correct prediction (blind) of human in-vivo profiles, based on fewer/no animal experiments, with a minimal number of input descriptors, is a desirable goal. In multifactorial fitting, the use of measured descriptors needs to be clearly differentiated from descriptors which are purely calculated. The uncertainty in the calculated descriptors would be helpful in estimating the aggregation of errors in overall prediction. The increasing use of machine learning methods and artificial intelligence can lead to accurate predictions. However, understanding the basis of such prediction may not readily suggest the steps to take to improve the properties of tested compounds.
Solubility plays a key part in deeper understanding of PK risks [4]. To predict the solubility of novel compounds, a manually curated database (Wiki-pS 0 ) of intrinsic solubility values of druglike molecules was assembled in 2011, with entries added steadily since then. A comprehensive mass-action nonlinear regression program, pDISOL-X, to analyze solubility-pH data was developed in parallel, with enhancements added periodically. To predict drug solubility, different computational methods were examined in a series of studies [10][11][12][13]. Initially [10], Breiman's Random Forest regression (RFR) machine learning method [14] was tested and compared to predictions of Yalkowsky's General Solubility Equation (GSE) [15] and Abraham's Solvation Equation (ABSOLV) [16]. Whether prediction models trained with small drugs could be used to predict the solubility of large bRo5 drugs was then considered [11]. A way to modify the traditional GSE by implementing Kier's molecular flexibility index (Φ) and Abraham's basicity descriptor (B) resulted in the novel 'Flexible-Acceptor' GSE(Φ, B) equation [12]. This equation was directed [13] to predict the intrinsic solubility values of 72 NMEs recently approved by the FDA (2016-2020) [17][18][19][20][21].
In the present study, the predictions are extended to the 2021 newly approved drugs. Also examined are the trends in the physicochemical properties. The upward growth in the number of H-bond acceptor values may have leveled most recently, but MW and clogP values appear to be still slightly on the increase, underlying concomitant lower solubility. In-silico models to predict solubility of such NMEs and of molecules not yet synthesized is expected to be an asset for early risk assessment [4].

Thermodynamic Basis of the General Solubility Equation (GSE)
Yalkowsky and coworkers [15,22,23] developed the General Solubility Equation (GSE), Eq. 1, to predict the solubility of liquid/solid nonelectrolytes (mostly industrial organic chemicals) in water. The method is particularly appealing since it requires no 'training.' Merely the melting point (mp in o C) and the octanol-water partition coefficient, either measured (log 10 P) or calculated (clogP), are prerequisites for predicting solubility (log 10 molar units): The thermodynamic basis of the equation was reviewed recently [13]. Briefly, the dissolution of a crystalline substance in water consists of two main contributions: (i) crystal lattice effect (XTL), i.e., the energy needed to break down the lattice to form a hypothetical 'supercooled liquid' (SCL), and (ii) solvation effect, i.e., the energy released as the SCL dissolves in water. The total solubility can be expressed as [22,23] where log 10 S XTL w = − ΔS m T m − T ∕(2.303 RT) ; ∆S m is the standard molar entropy of phase transformation, T is the absolute temperature (K), and T m is the melting point (K).

At 25 °C
Hansch and coworkers [24] showed that log 10 S of simple liquid solutes correlated linearly with the octanol-water partition coefficients, log 10 P ≈ log 10 (S oct liq / S w liq ). On re-arrangement, where log 10 S oct liq is the log 10 solubility of a liquid solute in octanol, ranging from − 0.3 to + 0.9 for small molecules [24]. Yalkowsky and coworkers rationalized log 10 S oct SCL = 0.5 in Eq. 1 [15].
Hansch's studies suggest that the constant coefficients in Eq. 1 might need to be modified for compounds from novel classes of chemical space. If the 'supercooled liquid' form of a large polar solute is not fully miscible with octanol, then the log 10 S oct SCL contribution could a negative number. A large molecule with a decreased S oct SCL (due to decreased miscibility with octanol) is expected to have an increased S w SCL . This would lessen the contribution of lipophilicity to the predicted solubility.
(4) log 10 S liq w ≈ log 10 S liq oct − log 10 P, with the variable coefficients modeled here as The c-coefficients as functions of Φ + B were determined by partial least squares (PLS open-source package from https:// cran.r-proje ct. org/ web/ packa ges/ pls) analysis of solubility data sorted on values of Φ + B and uniformly binned into 18 groups of 123-775 points, to ensure nearly constant Φ + B increments, as described previously [12,13]. Since our last study [13], the database has accumulated nearly 1000 new entries. So, a new set of b-constants was determined in the current investigation, using drug-relevant molecules as the training set, but excluding new drugs from the training. Values of Φ were calculated from the two kappa and the heavy atom count descriptors provided by the Landrum's RDKit open-source cheminformatics library [27]. Table 1 lists these Φ and B values.

Abraham Descriptors and the ABSOLV Linear Model for Predicting Solubility
Abraham introduced five solvation descriptors: A, B, S π , E, and V [16,26]. Two of these constitute H-bond potentials: A is the H-bond acidity (donor strength) and B is the H-bond basicity (acceptor strength) of the solute. S π is the dipolarity/polarizability, E is an excess molar refraction in units of (cm 3 · mol −1 )/10, and V is the McGowan characteristic molar volume in units of (cm 3 · mol −1 )/100. Values of the descriptors were calculated from 2D structures using the ABSOLV algorithm [26] (cf., www. acdla bs. com) and are listed in Table 1 for the new drugs.
Abraham and Le [16] amended the ABSOLV model to predict intrinsic solubility (log 10 molar): The independent variables are the five solute descriptors, plus the cross product of the H-bond terms. The seven d-coefficients were determined by PLS regression, using the training set database, exclusive of the new drugs set. Quaternary ammonium drugs and drugs with MW > 800 Da were each treated separately. The rest of the molecules were divided into four acid-base classes, with reference to predominant charge state at pH 7.4: acids ( −), bases ( +), neutrals (0), and zwitterions ( ±), as was done previously [10]. For each class, separate sets of d-coefficients were determined by PLS regression.

Statistical Machine Learning Random Forest Regression (RFR) Model
The RFR open-source 'randomForest' library for the R statistical software was downloaded from https:// www. stat. berke ley. edu/ ~breim an/ Rando mFore sts/ cc_ home. htm. The method works by constructing an ensemble of hundreds of decision trees employing (5a) Table 1 Physicochemical properties of newly approved drugs (2021)   about 200 RDKit-generated molecular descriptors [27]. The method was retrained with the presently enlarged database, excluding the newly approved drugs.

Sources of Solubility Data for the Test (New Drugs) and Training (Wiki-pS 0 Database) Sets
The 2021 mini-review of FDA drug approvals by Mullard [1] was a convenient starting point to identify the new drugs and to begin the search for their solubility values.
Since the drugs are new, there are hardly any journal publications reporting properties of the compounds. Almost all the data were found in FDA filing documents. As part of the New Drug Application (NDA) process, the FDA Center for Drug Evaluation and Research (CDER, www. acces sdata. fda. gov) publishes reports listing some physicochemical properties of compounds under consideration. There was virtually no experimental detail about the measurements in the published regulatory reports. Many of the reported solubility values are of drugs in water (S w ), with saturation pH not reported. When the temperature was not stated or was reported as 'room' or 'ambient,' it was assumed to be 23 °C for the purpose of calculations here. In the dearth of experimental detail, it is a challenge to assess the quality of the reported measurements in most of the FDA reports. Nevertheless, there are high-quality data in some of the documents, where solubility measurements were published as a function of pH. Examples of some of these are presented below.
Of the 36 small-molecule NMEs approved in 2021, 39 independent quantitative solubility measurements were found only for 28 NMEs , given that some solubility data are redacted or presented as qualitative values (e.g., 'insoluble,' 'poorly soluble,' 'very soluble') in FDA reports. The reported values were transformed into the intrinsic solubility scale, S 0 , using known (or predicted when unavailable) pK a values, and adjusted to 25 °C [63] using the program pDISOL-X (in-ADME Research) [64][65][66][67][68][69][70]. Table 1 lists the solubility data (normalized as intrinsic values), along with the pK a values used in the data analysis at the temperatures of measurement.
The structures of the 28 new drugs considered here are shown in Fig. 1. In dual-API drug products, each API was treated as a separate 'drug' in the data analysis.

Sources of Octanol-Water Partition Coefficients (clogP) and Melting Points (mp)
Values of clogP were used in Eqs. 1 and 5 in place of experimental log 10 P values. These were calculated by the Wildman-Crippen sum of atomic contributions method in the open-source RDKit cheminformatics library [27]. Experimental mp values were employed where available or were calculated otherwise [75].

Data Reduction
About two-thirds of the drugs in Table 1 had their solubility measurements performed in two or more pH buffer solutions. This generally leads to more reliable determinations of log 10 S 0 , provided pK a values are confidently known. For the rest of the drugs, S 0 values were determined from the reported water solubility (S w ) values. In these cases, the pH of the saturated solutions was also calculated, assuming the Henderson-Hasselbalch (HH) equation is valid and the pK a value is reliable. When aggregates/complexes form or when supersaturation persists in the suspension, the HH equation does not accurately predict the shape of the log 10 S-pH curve for ionizable molecules [65][66][67][68][69][70]. There is no direct way to recognize such anomalies just from a single S w measurement.
In cases where measured pK a values could not be found, they were calculated using the ChemAxon MarvinSketch v5.3.7 program (ChemAxon Ltd., https:// www. chema xon. com), as indicated by italic values in Table 1. In a few cases, it was possible to determine pK a values directly in the analysis of the log 10 S-pH profiles (underlined values in Table 1).
Examples of experimental log 10 S-pH profiles reported for some of the new drugs are shown in Fig. 2. The circle symbols represent the measured pH-dependent log 10 S values at 25 °C. The solid curves represent best-fit regression curves. The dashed curves were calculated by the Henderson-Hasselbalch equation, using the best-fit log 10 S 0 and the supplied/refined pK a . Frame c is that of an ampholyte (sotorasib). The rest of the frames are of bases (daridorexant, finerenone, ponesimod, vericiguat, and avacopan). The data from the first five drugs appear to be well defined by the HH equation. It was possible not only to determine the best-fit log 10 S 0 , but also the values of pK a (frames a, b, d, e) and the pK sp (frames c, e).
When profiles deviate from expected HH shapes, it may be possible to assess (and to correct for) the degree to which the measurements may be supersaturated or if aggregates/ complexes are forming [65][66][67][68][69][70]. Figure 2f (avacopan) shows such an example of 'anomaly,' where for pH > 4, the reported solubility points are higher than that expected for a solution saturated in the free base (dashed curve). The solubility values in the pH 1-3 interval, which lie on the diagonal portion of the HH curve, define the intrinsic value, based on the reported pK a . The suspension above pH 4 may have been (a) supersaturated with respect to the free base during the measurement, or (b) due to solid being amorphous in the pH > 4 region, or (c) due to the formation of aggregates of the neutral molecule, or (d) complex formation between the free base and the buffers in solution [67,68]. Solid-state characterization or LC/MS analysis of saturated solutions may be able to rule out some of the possibilities. The unfilled circle symbols above pH 6 were assigned zero weights in the regression analysis. Had only a S w measurement been reported in water for avacopan and the pK a was not known, the intrinsic solubility might have been determined at an order of magnitude too high. There would have been no hint of any 'anomaly.'

Determination of the Three GSE Coefficients from Training Set iso-(Φ + B) Bins
The training set solubility data were sorted by Φ + B into 18 bins of increasing values. (Fewer bins were used in our previous study [13].) For a narrow range of Φ + B values in each bin, the three GSE c-coefficients in Eq. 5 were determined by linear PLS regression, in a similar way that Hansch et al. [24] had trained the GSE for different chemical classes of compounds. The c-constants are depicted by the points on the three curves in Fig. 4. The best-fit equations (cf., Eqs. 5a-c) as functions of Φ + B are listed in the figure. The c 0 and c 1 functions follow the previously reported trends [13]. Evidently, solubility dependence on flexibility and H-bond acceptor strength are mediated by solution-phase interactions [76]. The crystal lattice contribution depicted by the c 2 function appears to show an upward trend with increasing Φ + B, which was not evident in the earlier study [13] based on a larger test set of newly approved drugs and a smaller training set of drug-relevant compounds. The solubility of the most flexible molecules appears not to depend on crystal lattice contributions, where c 2 ~ 0 for Φ + B ~ 25.
From the thermodynamics considerations, the c 0 coefficient may be viewed as a measure of the solubility of the 'supercooled' liquid solute in octanol (c 0 ≈ log 10 S oct SCL ). Increasingly flexible molecules with strong H-bond acceptor character appear to be less miscible with octanol, as suggested by the decreasing c 0 coefficients with increasing Φ + B (Fig. 4). Between bins 1 and 18, S oct SCL decreases by five orders of magnitude. The SGE(classic) model assumes a constant 0.5 intercept in Eq. 1, which appears to be more consistent with rigid molecules (Φ + B ~ 2). Given that the c 1 coefficient also changes with Φ + B, the precise thermodynamic interpretation of the c 0 coefficient is less clear than in the classical derivation [15,22,23], where c 1 is constant at − 1.

ABSOLV Training
As was done previously [10], the training set molecules were considered separately in each of four acid-base classes, with reference to predominant charge state at pH 7.4: acids ( −), bases ( +), neutrals (0), and zwitterions ( ±) (cf., Fig. 7 in Ref [10]). In addition, the quaternary ammonium drugs, and drugs with MW > 800 Da were treated as separate classes. The d-coefficients in Eq. 6 for each of the six classes were determined by PLS regression using the log 10 S 0 values from the database, excluding those of the 2021 NMEs. Table 2 summarizes the d-coefficients by classes.
The d-coefficients in Table 2 are close to those reported in Table 1 of Ref [10]. Although there are about 1000 additional entries in the present database compared to that used previously, the statistics in the older study are slightly better (e.g., the RMSE values for acids, bases, neutrals, and zwitterions were 0.98, 0.87, 1.01, and 0.77, resp., in Ref [10]). The residual plots in Fig. 7 in Ref [10] are visually indistinguishable from those presently calculated (data not shown).

Random Forest Model Training
As was done previously [10][11][12][13], the Random Forest Regression (RFR) internal validation was applied to randomly selected 30% of the database, based on training using the other 70% of the database (exclusive of new drugs). For molecules like those of the current database, it is expected that their log 10 S 0 could be predicted with r 2 = 0.89, RMSE = 0.66, with 73% of the molecules 'correctly' predicted. The actual prediction statistics of the test compounds did not reach the expectations of the training set. The solid red curves are the best fit to the measured data (circles), using the regression analysis program pDISOL-X. It was also possible to determine the pK a values in cases (a, b, d, e). The dashed curves were calculated using the Henderson-Hasselbalch equation, incorporating the pK a used and the refined log 10 S 0 . In cases (c) and (e), it was possible to determine the salt solubility products  Figure 5 shows the results of the predictions of the solubility of the newly approved drugs (external test sets) by the four models, as measured log 10 S 0 vs. calculated log 10 S 0 correlation plots. Table 3 summarizes the results. The solid diagonals are identity lines. The dashed diagonals are ± 0.5 log 10 unit displaced from the identity lines. The measure of prediction performance (MPP) is indicated by the pie charts as the percentage of predicted values that are within ± 0.5 log 10 unit of the observed values [77]. Briefly, the four results (Figs. 5a-d) look similar. All the r 2 were in the range of 0.57 to 0.60, and RMSE values were between 1.18 and 1.22, as MPP values range from 25 to 46%. The GSE(classic) had the highest MPP and appeared to have a symmetrical distribution of residuals about the identity line. The other three models tended to overpredict solubility of drugs with log 10 S 0 < 7, which may hint that those molecules possessed structural features not common to the training database. All models showed a systematic negative bias, ranging from − 0.22 to − 0.37 log 10 .

Model Testing
The consensus based on the average of the ABSOLV and GSE(Φ,B) models produced the best statistics, as indicated in Fig. 5e, with r 2 = 0.67 and RMSE = 1.08. The   Fig. 4 Re-training the Flexible-Acceptor GSE(Φ,B) model. The solubility data in the training set were sorted on Φ + B and then divided into 18 practically constant-value (Φ + B) bins. On the average, each bin contained about 413 log 10 S 0 measurements. For each bin (represented by a point in the plots), the three constants in Eq. 1 (cf., Eq. 5) were determined by PLS regression to best fit the intrabin solubility data. Quaternary ammonium compounds and newly approved drugs were excluded from the analysis discrimination between the four models was higher in our previous study [13], covering the NMEs from 2016-2020.

More is Needed than Just Increasing the Size of the Training Set
Although the database has steadily increased in size over the last ten years, it has been our observation that this alone has not proportionately improved its ability to predict the solubility of drugs. Generally, the GSE(classic) underperformed when compared to the other models. The GSE(Φ + B) matched the performance of the RFR model. Metrics such as those in Fig. 5 are comparable to those previously reported [10][11][12][13], although for the 2021 NMEs, the statistics are somewhat worse than those for the 2016-2020 NME set. Solubility prediction depends on multi-dimensional factors, e.g., quality of measurements (training and test sets), distribution of training set molecules in chemical space in relation to the tested drugs, and sensitivity of descriptors used in prediction models. Simply increasing the size of the solubility training set may not lead to improved predictions. Compiling a large database aimed at maximizing chemical diversity may be an inefficient strategy for predicting the solubility of novel molecules, given the enormous size of the chemical space, and since drugs appear to exist there as small tight clusters, as pointed out by Lipinski [78]. It would be helpful if the quality of future measurements were to improve. This could be better assessed in peer-reviewed publications than in regulatory filings. New descriptors which can better differentiate the factors affecting solubility also can be important for narrowing the gap between the accuracy of the prediction models and that of the experimental data.

Conclusion
If good practices in solubility measurement were adhered to, as detailed in the recent data-quality 'white paper' by experts from six countries [69], and the experimental details were more transparent, newly reported measurements could be expected to achieve precision approaching that of the curated database used as the training set (average interlaboratory SD < 0.2 log 10 unit). Presently, the data quality in the database is not the limiting factor in prediction, given that the best prediction root-mean-square error achieved in this study is above a log 10 unit. The benchmark statistical machine learning approaches are probably up to the task in narrowing the gap between prediction and measurement. The Flexible-Acceptor GSE(Φ,B) performed nearly as well as the benchmark Random Forest regression method in predicting the aqueous intrinsic solubility of the newly approved drugs since 2016. The consensus model based on the average predictions of the ABSOLV and GSE(Φ,B) methods was found to reduce the prediction biases in the separate methods, but perhaps even more significant, it slightly outperformed the Random Forest regression method overall. This is an advantage since the relatively simple consensus model can be readily incorporated into spreadsheet calculations.
As new drugs are approved, it will be important to continue monitoring the quality of measured solubility. Matching prediction to measurement can be of immense practical value when prediction methods are applied to virtual libraries, in order to seek   opportunities to minimize pharmacokinetic risks of large, but otherwise promising, candidate molecules.