Workflow overview. The objective is to make accurate RT predictions for a given CM using publicly available datasets from other CMs. The post–projection calibration method is developed to calibrate the effect of LC setups on these predictions. The workflow is shown schematically in Fig. 1. In the first step, a multiple CMs–based retention time (MCMRT) database was developed. This database includes > 9000 experimental RTs for 343 molecules with a high diversity of chemical structures and 30 CMs with different LC setups. A total of 33 molecules are selected from the database based on their retention behaviors and used as calibrants.
In the second step, the experimental RTs of the calibrants from a publicly available dataset (i.e., input CM) and a given CM (i.e., output CM) are used to generate a projection model. This public dataset is also used to train a QSRR model for molecules whose experimental RT was not recorded in the dataset. If there is no experimental RT for all calibrants in the dataset, the predicted RTs by the QSRR model are used to replace the missing values. With the projection model, the dataset's experimental RTs and the QSRR model's predicted RTs can be transferred onto the given CM.
In the final step, a reference–input CM is selected based on its similarity with the input CM in the elution pattern. The reference–input and output CMs are used to determine the RTs of 33 calibrants. The reference–projection model is generated using the experimental RTs of 33 calibrants on the reference–input and output CMs. In untargeted analysis, the RTs of putative candidates for an unknown identity observed on the output CM are predicted with the projection model, based on their experimental RTs in the dataset or predicted RTs by the QSRR model. The reference–projection model yields a reference–predicted RT (rpRT) for the observed unknown identity. The rpRT can be used instead of experimental RT to compare with predicted RT. This method can calibrate the projection error caused by LC setups, enabling accurate RT data transferring among a wider range of CMs, and improving annotation of unknown molecules.
The MCMRT database. Three primary purposes of developing the MCMRT database are: ⅰ) selecting appropriate calibrants, ⅱ) providing concrete examples to demonstrate the limited applicability of the projection approach, and ⅲ) studying the capability of post–projection calibration approach. First, we selected 343 molecules from various chemical classes and obtained their standard materials from suppliers. Then we acquired their RT data on 30 different CMs via RPLC/ESI–HRMS. Details of RT data acquisition were provided in “Methods”. These molecules covered broad ranges of octanol/water partition coefficient values (log Kow −8.1 to 11.6) and molecular weights (89–1449 Da), enabling to cover the entirety RT range in RPLC and mass–related properties, e.g., including both positive and negative modes and six representative adducts in ESI–MS (Fig. 2a). In terms of chemical classes, they covered 11 ClassyFire’ groups (superclass level)33, including benzenoids (27.7%), organic acids and derivatives compounds (20.4%), organoheterocyclic compounds (18.7%), lipids and lipid-like molecules (9.9%), phenylpropanoids and polyketides (7.6%), organohalogen compounds (7.3%), organic oxygen compounds (3.5%), organosulfur compounds (1.2%), organic nitrogen compounds (1.2%), organophosphorus compounds (1.2%) and other compounds (1.5%). Notably, the METLIN database (80,038 molecules) covered seven superclasses, and MCMRT also included these classes, except nucleosides and nucleotides19. Furthermore, the organohalogen compounds (e.g., perfluorinated and polyfluoroorganic compounds) and organosulfur compounds contained in MCMRT were not observed in METLIN. Diverse element compositions (C, H, O, N, P, S, Cl, Br, F and I) indicated that these molecules have a wide range of physiochemical properties (Fig. 2b). These results demonstrated that the molecules in MCMRT are highly diverse and representative of chemical structures. Detailed information, including molecular formula, molecular weight, log Kow, polarity response, chemical class, etc., can be found in Supplementary Table S1.
The 30 CMs covered six C18 columns with different specifications (50–150 × 2.1–4.6 mm, 1.7–5 µm), six mobile phase compositions with different buffers (acidic, ammonium, mixed and semi–mixed), nine running times (10–100 min), seven gradient profiles (single or multiple gradients), five flow rates (constant or variable flow rate, 0.2–1 mL/min), and three column temperatures (30, 40 and 45°C). More details about the instrumental and chromatographic conditions are described in Supplementary Table S2. From these CMs (marked as CM 01–CM 30, Fig. 2c), many pairs of CMs with only one different chromatographic parameter can be found. For example, the only C18 column between CM 04 and CM 05, the only mobile phase composition between CM 03 and CM 30, the only running time between CM 01 and CM 13, and the only gradient profile between CM 09 and CM 10 were different. We used such CMs to study the effect of a single chromatographic parameter on elution pattern, as the hypothesis behind reliable RT projections is that the molecule elution order is largely conserved9.
As a result, a total of 9,018 experimental RT values from 30 CMs were included in the MCMRT, of which 249 molecules had experimental RT values on all CMs, i.e., they overlapped between these CMs. About 16 to 69 molecules were not analyzed or undetectable on different CMs mainly because their preferred adducts showed low ionization efficiency. Figure 2c shows the distribution of experimental RTs across the 30 CMs for all molecules in MCMRT. These RTs were evenly distributed within each CM running time, indicating that these molecules can demonstrate the entirety RT range in LC and have no obvious preference for specific RT ranges. The experimental RT values are available in Supplementary Table S3.
The calibrants. The calibrants were also used to select an appropriate reference–input CM, which has high similarity in elution pattern with the input CM. Therefore, the selection of calibrants included the following considerations: 1) they can be detected by LC/ESI–MS; 2) they can cover the entire RT range in RPLC; 3) they can account for the effect of LC setups on molecule elution order. The latter two features enable the calibrants to demonstrate the overall elution pattern for various CMs, i.e., by knowing the RTs of calibrants on input and reference–input CMs, the similarity in elution pattern between the two CMs can be estimated. Based on the third consideration, cluster analysis of the RT matrix in MCMRT was performed using self–organizing mapping neural network (SOM) technology34,35, to characterize the retention behavior with different LC setups for the 343 molecules. These molecules were classified into 25 groups with different RT distributions, and each group exhibited similar retention behavior (Fig. 3a). From each group, at least one molecule was selected as a calibrant. The set of calibrants ultimately contained 33 molecules (Fig. 3b). It can be seen from Fig. 3, the elution patterns demonstrated by 33 calibrants and 343 molecules were highly consistent. For example, their R2 of experimental RTs between CM 24 and CM 26 (the two least similar CMs) only differed by 3.3% (0.651 vs. 0.673, Figs. 3a–b). These results indicate that the calibrants could determine the CM similarity in the elution pattern. The details for retention behavior classification and calibrants selection were provided in “Methods”. The chemical information for calibrants and the group of retention behavior for all molecules are available in Supplementary Table S3.
Before using calibrants for post–projection calibration, the differences in elution patterns caused by specific LC setups were also analyzed. From Fig. 3c, CMs with similar C18 column specifications gave not only the general conservative molecule elution order but also consistent RT values, with an R2 of 0.992 for 33 calibrants and a maximum difference in RT of 1.68 min. Nevertheless, differences in RT were observed for some molecules on the CMs with different C18 column specifications (Figs. 3d–e), especially when the mobile phases are acidic, resulting in the reduction of R2 to 0.968 (Fig. 3f). For CMs with different flow rates, the elution order is largely conservative with an R2 of higher than 0.973, although the RT values changed for some molecules (Fig. 3g). We also found that the greater the difference in running time between CMs, the lower the R2 between RTs and the greater the difference in RT for most molecules (Figs. 3h–i). For example, the RTs between CM 02 (15 min) and CM 12 (45 min) yielded an R2 of 0.937 and a maximum RT difference of 11.6 min (Fig. 3h). In contrast, that between CM 01 (10 min) and CM 16 (100 min) yielded an R2 of 0.924 and a maximum RT difference of 64.8 min (Fig. 3i). Such differences in RT indicated the importance of using an RT data set on a specific CM for training QSRR models and the necessity of predicting RTs on different CMs. Mobile phase composition negatively affects molecule elution order, as the RT differences vary for many more molecules. This is because the additives may change the pH value and ionic strength of solvents, which would change the retention behavior of many molecules by protonation, deprotonation or ionic interaction. Specifically, similar mobile phases yielded the R2 of 0.936–0.979 (Figs. 3j–m). The mixed and semi–mixed buffers, mixed and acidic buffers, and mixed and ammonium buffers yielded the R2 of 0.903, 0.904 and 0.899 (Figs. 3n–p). The lowest R2 of 0.752 was observed between ammonium and acidic buffers (Fig. 3q). In addition, the CM similarity will be further reduced when there are many differences in LC setups. For example, CM 24 and CM 26 yield the R2 of 0.651 with different C18 columns, running times, flow rates and mobile phase compositions (Figs. 3a–b).
Post–projection calibration improves experimental-experimental projections. With the MCMRT database, we first studied the effect of specific LC setups on RT projection using experimental–experimental projections (Supplementary Tables S4–6). CMs 03, 22 and 28 with different mobile phases were used as input CMs, respectively. For each input CM, the remaining 29 CMs were used as output CMs respectively (see Methods for details). The distribution of the relative projection error for each output CM is shown in Figs. 4a, g, i. The projection errors varied depending on the running time and mobile phase composition of each output CM. Thus, these output CMs were divided into four groups: CMsA, B, C and D, respectively. CMsA and B contained CMs with the same mobile phase composition as the input CM whereas CMsC and D contained CMs with different mobile phase compositions from the input CM. CMsA and C contained short CMs (< 45 min), whereas CMsB and D contained long CMs (≥ 45 min). Using CM 03 as input CM, the median error across all projections in CMsA and B were 0.9 and 1.5%, which were smaller than that in CMsC and D (3.5 and 8.1%, Fig. 4a). Consistently, similar results were also obtained using CM 22 and CM 28 as input CMs. Specifically, using CM 22 as input CM, the median error across all projections in CMsA, B, C and D was 1.0, 3.4, 5.7 and 7.1%, respectively. There was no output CM in CMsB when using CM 28 as input CM, and the median error across all projections in CMsA, C and D was 1.3, 5.6 and 9.0%, respectively (Fig. 4i). These results demonstrated that the experimental–experimental projections allow the RT transfer between two CMs with the same mobile phases and short running time, showing very high accuracy (median error < 1.8%). Slight differences in the C18 column, gradient, flow rate and column temperature hardly increased these errors. However, a long running time could increase the median error to 3.4%, e.g., CM 16, with a running time of 100 min. Yet, different mobile phase compositions further declined the projection accuracy31. The median error was between 5.4 and 14.3%.
Next, we demonstrated the calibration of projected errors with the proposed post–projection calibration approach (Supplementary Tables S4–6). CMs 07, 24 and 27 were used as reference–input CMs for input CMs 03, 22 and 28, respectively (see Methods for details). The distribution of the relative calibrated error for each output CM is shown in Figs. 4d, h, j. The median error for an output CM can be as low as 0.3% and was, in all cases, below 3.8%. The median error across all predictions in CMsA, B, C and D was reduced by 25.6, 60.5, 83.2 and 79.5%, respectively, in contrast to those before calibration. These results indicated that the error from the projection itself could be calibrated by using rpRT instead of experimental RT. The post–projection calibration method improves the RT transfer between CMs with significant differences in LC setups, e.g., long running times and different mobile phase compositions. We demonstrated that after calibration the number of molecules with an RT error high than 10% was, in all cases, below 7.3% (N = 22), while before calibration it can be as high as 71.7% (N = 218).
Then, we studied the effect of reference–input CMs on post–projection calibration (Supplementary Table S4). Five CMs were used as reference–input CMs for input CM 03, respectively. They were CM 01 with R2 (i.e., of RTs between input and reference–input CMs for calibrants) = 0.923, CM 18 with R2 = 0.981, CM 15 with R2 = 0.990, CM 05 with R2 = 0.993 and CM 07 with R2 = 0.998. The distribution of the relative projection error before and after calibration for each output is shown Figs. 4a–f. The results demonstrated that the similarity in molecule elution order between input and reference–input CMs determines the accuracy of the calibration. Specifically, the median error across all predictions for all output CMs was 4.7% (Fig. 4a). After calibration, this error was decreased by -22.1, 31.4, 64.0, 52.0 and 75.6% using CM 01 (Fig. 4b), CM 18 (Fig. 4f), CM 15 (Fig. 4e), CM 05 (Fig. 4c) and CM 07 (Fig. 4d) as reference–input CMs, respectively. All output CMs observed a consistent conclusion that the calibration accuracy increased with the increase of R2 from 0.923 to 0.998. Because CM 01 is slightly similar to the input CM 03, most calibrations had an error greater than that after calibration, especially for the CMs in CMsA and B. These results indicated that using reference–input CMs that are highly similar to the input CM had extremely high accuracy for calibration, e.g., CMs 05, 07 and 15, with a median error of less than 3.8% in all cases.
We also demonstrated that the number of molecules used for model training determines the projection and calibration accuracy (Fig. 4k and Supplementary Table S7). Specifically, as the number of calibrants increased from 33 to 151, the median error across all predictions in CMsA, B, C and D was reduced by 19.8, 22.2, 39.3 and 40.6%. After calibration, they were reduced by 11.4, 7.4, 10.3 and 29.9%. The median error after calibration for an output CM was in all cases below 3.8%, indicating that a small number of calibrants can achieve high accuracy for post–projection calibration. For sets A and B containing the same number of calibrants, the projection errors for each output CM varied; the median error before calibration using set A was smaller than that using set B, while after calibration, the median error using set A was greater than that using set B. It can be contributed to that the difference in the retention behavior of molecules in set B is smaller than that in set A, resulting in overfitting for projection models. However, smaller differences in the retention behavior of molecules in set B had higher similarity in calibrant elution order between input and reference–input CMs (R2 = 0.998), resulting in good calibration in contrast to set A (R2 = 0.993). In order to reduce experimental costs and demonstrate the overall elution pattern for a CM, 33 molecules in set A were accepted as calibrants.
Post–projection calibration improves predicted–experimental projections. Experimental–experimental projection and calibration approach enables the sharing and utilization of experimental RTs across laboratories independent of LC setups. Nevertheless, direct projection of experimental RTs between CMs can only share the RT for molecules already recorded in the data sets. We further tested the scalability of the post–projection calibration approach by using a publicly available dataset (Supplementary Tables S8) to deploy a QSRR model with an artificial neural network (ANN). This dataset contains 1,820 emerging contaminants whose RT was experimentally determined using CM 03online. From all molecules in the dataset (except for 145 that overlapped with our 343 molecules), 75% of them (N = 1258) were randomly selected and used as a training set, whereas the remaining 25% (N = 408) were used as validation set (see Methods for detail).
Prediction results of the proposed ANN for the training and validation sets are available in Figs. 5a–b and Supplementary Tables S8. The mean and median relative errors for the validation set were 13.8 and 9.0% and the mean and median absolute errors were 0.8 and 0.6 min. In studies with the same dataset, Aalizadeh et al.18 used a genetic algorithm combined to support vector machine for modeling and reported similar performance to the ANN, with mean and median relative errors of 12.3 and 8.6%, respectively. From all molecules in MCMRT, 293 known experimental RT on CM 03local were used as an external set to validate its generalization ability. The performance was estimated by projecting their predicted RT from CM 03online onto CM 03local via predicted–experimental projections and comparing with their experimental RT (see Methods for detail). The mean and median relative errors were 13.7% and 9.2%, respectively (Fig. 5c and Supplementary Tables S8). Among molecules with a large predicted error (> 2 min), the main ones were perfluorinated compounds and some compounds with high molecule weight (i.e., valinomycin, iodixanol), whose similar structures were not found in the dataset. This demonstrated the structural similarity between the training dataset and the input chemical structure affects the accuracy of predicted RTs19.
To study the effect of different predicted errors on post–projection calibration, these molecules were classified into four groups (1–4) based on their predicted error, including 206 molecules with an error less than 1 min, 51 molecules with an error between 1 and 2 min, 32 molecules with an error between 2 and 4 min, and 4 molecules with an error greater than 4 min (Fig. 5d). The median relative error for the four groups was 4.6, 16.2, 33.2 and 76.1%, respectively. Their predicted RTs were projected onto 30 local CMs in MCMRT (excluding CM 07local used as reference–input CM). The predicted–experimental projection results before and after calibration are shown in Figs. 5e–h and Supplementary Tables S9. The relative errors across all projections varied depending on their predicted error; the median error in CMsD spanned from 10.7 and 21.6% with predicted error smaller than 2 min (groups 1 and 2, respectively) to 39.8 and 80.1% with predicted error larger than 2 min (groups 3 and 4, respectively) (Fig. 5e). Similar results were also observed for CMsA, B and C groups, as well as for their calibrations (Figs. 5f–h). Furthermore, with the similarity in elution pattern between input and output CMs decreased, the projection error in group 1 increased significantly. These results indicated that both predicted error and CM similarity determine the accuracy of predicted–experimental projections. For output CMs with different mobile phases from the input CM (Fig. 5f), statistically significant differences of mean error were observed before and after calibration for groups 1, 2 and 3 (Bonferroni test, P < 0.01, ngroup1 = 2159, ngroup2 = 536, ngroup3 = 336). The mean error across all predictions in CMsA, B, C and D was reduced by 5.4, 6.0, 41.2 and 46.8% for molecules in group 1. The median error for an output CM was in all cases below 7.6%, while before calibration it ranged from 2.8 to 11.4%. These results demonstrated that the post–projection calibration approach enabled a more accurate transfer of predicted RTs onto slightly similar CMsC and D. No significant negative impact on projection error was observed for highly similar CMs A and B. Notably, the inherent prediction error from the QSRR model cannot be calibrated with this approach. Thus, a robust QSRR model, which enables accurate RT prediction for various molecule structures, is also required to support predicted–experimental projections.
Implication of post-projection calibration for isomer annotation. With predicted–experimental projection and approach, we further investigated its implication to improve the ranking of isomeric candidates and filtering false–positive candidates. This is done to simulate an expected situation in untargeted analysis with LC–HRMS where the chemical formula for an unknown identity is annotated (predicted) through a built–in software of the instrument, but the molecular structure is not known. From all the molecules in MCMRT, 136 were selected as unknown identities, with monoisotopic masses from 98 to 1449 Da and log Kow values from − 8.1 to 9.8. We searched their chemical formulas on the websites of ChemSpider and PubChem, resulting in a total of 2343 isomeric candidates (2 − 40 in each). The RT of these candidates was predicted using the ANN model and transferred from CM 03online to CMs 08local, 15local, 24local and 26local, respectively. CM 07local was used as reference–input CM to derive their rpRT. Results from all predictions, projections and calibrations for these candidates are available in Supplementary Table S10.
We used a receiver operating characteristic (ROC) curve to determine the optimal error threshold (see Methods for details). Figures 6a–d show the ROC curve for each CM. The area under the curve (AUC) after calibration was in all cases above 0.72, while before calibration the AUC values ranged from 0.65 to 0.73. The best filtering error threshold was 11, 14, 15 and 8% for the four CMs. Compared to before calibration, a significant decrease in this threshold was observed for CM 24local (25% vs. 15%). In addition, the optimal threshold in long CMs 15local and 24local was relatively greater than that in short CMs 08local and 26local. Figures 6e–f show the filtering results. The true positive candidates (TP) increased by 4 and 6 for CMs 24local and 26local, and their true negative candidates (TN) increased by 187 and 94, respectively. Although compared to all isomer candidates, the improved results after calibration are relatively small. However, if calibration is not carried out, these candidates (e.g., with special retention behavior on the output CM) will routinely be incorrectly identified and omitted in practical applications. It is important to note that after calibration ~ 30% of the false candidates were accepted for all CMs and ~ 25% of the true identities were incorrectly filtered for many reasons, including larger prediction errors from the ANN model and small changes in RT for these isomeric candidates using conventional LC methods.
The isomeric candidates were ranked based on their RT error before and after calibration. Figure 6g shows the ranking results for each CM. Before calibration, the number of true identities among the top 6 candidates (Ntop6) ranged from 80 to 101 for all CMs, while after calibration, it was in all cases above 97 (71.9% of all identities). Especially for CM 26 local, the Ntop1 value increased from 15 to 32 (113%). In addition, the sum of all true identity rankings was improved by 12 and 20% for CM 24local (719th vs. 631th) and CM 26local (848th to 676th). No significant improvements were observed for CMs 08local and 15local. Figure 6h shows an example of improving ranking with calibration. The correct identity, vardenafil, was ranked 2nd among 25 isomeric candidates using rpRT (15 min) to compare with predicted RT (14.6 min), while it dropped to 20th using experimental RT (10.2 min) for comparison. These results demonstrate that rpRT significantly improved the ranking of correct identities for those output CMs with different mobile phases from the input CM.