3.1 Optimization of HPLC conditions and method validation
The mobile phase was investigated, including the separation effects of methanol and acetonitrile, the differences between phosphoric acid and formic acid, and the influences of column temperature. The gradient elution procedures and flow rates were optimized. The selected chromatographic conditions had good resolution, symmetrical peak shape, and reasonable analysis time. Chromatograms of RSs and samples were collected on 22 columns under optimized chromatographic conditions. Representative chromatograms and spectra are shown in Figure 2 and 3.
Methodological validation experiments were performed on the Agilent Zorbax SB C18 column. The precision (n = 6), stability (12h, n = 6), and repeatability (n = 6) were tested. The results showed that RSD of the tR of the 11 peaks and the peak areas were both less than 3%, thus meeting the requirements of fingerprint analysis.
3.2 Initialization for the DRS method
Since the columns of number 1 to 17 could effectively separate 11 peaks of the samples, data on these columns were utilized to initialize the model by steps, as shown in Figure 4. The first step was data importing. The chromatographic data and corresponding of the samples on columns 1 to 17 were imported into the software, and integration operations such as adding and deleting peaks were performed. The chromatographic data were in ANDI format, with the file name extension “.cdf”. The spectral data were in extended ANDI format, with the file name extension “.nc”. The PDA data was optional. The second step was the peak assignment. Names of the 11 compounds were input into the software, and then the corresponding peaks of the 17 columns and the compounds (the red box part of Figure 5) were matched one-to-one. The third step was setting the qualitative chromatographic method, taking LCTRS as an example. The tR window of the peak was set to 1 minute. If the tR deviation for the peak was≤ tR window, the peak could be identified. In this study, peak 1 and peak 9 (recommended to select the peaks close to the first peak and last peak respectively, including the first peak and last peak as well) were selected as two reference compounds, as shown in the green box of Figure 5. The spectral data were available in the present study, and the fourth step was to establish a spectral qualitative method. As shown in the area of the blue box in Figure 5, the synthesized spectrum was selected as a spectral matching method, and the similarity threshold was set to 0.95.
3.3 Optimization and evaluation of DRS method
3.3.1 Selection of reference compound
Since the selection of the reference compound can significantly affect the accuracy of the RRT and LCTRS method to calculate the tR, the optimization was needed. According to our previous studies [14, 34], the general principles for RRT and LCRRS method to select reference compounds were as follows: the tR coverage of the reference compounds was 50% to 100%, and their non-linear deviation was small enough. The coverage of tR was a reflection of the relative position of reference compound between the first compound and the last compound. For the LCTRS method and RRT method, the calculation of the coverage method was expressed in formula (2) and (3), respectively. Since there were various marker compounds in the overall quality control method, even if following the above principle, a large amount of calculation was still required to obtain the optimal reference compounds for the sample under certain chromatographic conditions.
tR2 is tR (or StR) of second reference compound; tR1 is tR (or StR) of first reference compound; tRlast is tR (or StR) of last compound; tRfirst is tR (or StR) of first compound.
tRreference is tR of reference compound; tRlast is tR of the last compound; tRfirst is tR of the first compound.
In the present study, 11 marker compounds and a total of 55 reference compound pairs were obtained, among which about 20 pairs were with tR coverage more than 50%. The software's method optimization function provided the top 10 reference compound pairs with the highest accuracy, as shown in Table 2. It was revealed that the tR deviation (average deviation of 11 peaks on 17 columns) of the reference compound pair peak 1 and peak 9 was 0.304min, and the identification rate was 99.5%, ranking 9th. However, the best pair was peak 3 and peak 9, with tR deviation being 0.258 min and identification rate being 99.5%. In comparison, the optimal combination reduced the deviation by 0.046min.
3.3.2 Adjustment of tR window
Obviously, on one hand, the smaller the tR window, the more accurate the method was, but on the other hand, the fewer the applicable columns were. The optimal tR window could be determined by the statistical results in the software's method optimization function. According to Table 3, which showed the average tR deviation on 17 columns of different peaks, the average tR deviation of No.1 to 10 was less than 0.3 min, but for No.11, it was 0.6 min. Therefore, it might be appropriate to set a tR window of 0.8min to cover the tR deviation of all peaks.
To verify this value, different tR windows were set; the tR deviation (average deviation of 11 peaks) and identification rates on different columns are summarized in Table 4 and Figure 6. The obtained results revealed that the windows of 0.3min and 0.5min were so narrow that the identification rate was less than 93%, and only a few columns were available, with a proportion less than 53%. Furthermore, the identification rates of 1.5min and 2.0min and the available columns were more than 99% and 94%, respectively, and the tR window was considerably large; however, there was a risk of misjudgment. It was demonstrated that 0.8min and 1.0min were near the inflection point, being a good balance for both the accuracy and the applicability. Finally, 0.8min was selected.
Each peak can be set its own tR window. For example, a window of 0.8min could be set for peak 11 and 0.5min for the other peaks. Smaller tR windows were used for the other peaks in this study, which further improved the accuracy of the method and reduced the misjudgment rates.
When the PDA spectrum qualitative function was available, the tR window could be widened. In the current study, it was set to 1.5 min according to the results of Table 4. According to our previous study, tR window was set to 0.5 min [13], 0.6 min, 1.2min [14], 0.3 min [15] and 0.7 min [18], respectively. Therefore, when only the chromatographic qualitative function was used, the tR window was recommended to be 0.5 to 1.0min. However, when the PDA spectrum function was obtained as well, it could be widened to 0.5-1.5min.
3.3.3 Comparison of different methods
The software could provide four methods for peak identification, including the RRT method, LCTRS method, RRT combined with the PDA method, and LCTRS combined with the PDA method. The conditions of the four methods optimized according to "3.3.1" and "3.3.2" are shown in Table 5.
Taking Col15 (sunfire C18) as an example, Figure 7A and 7B showed the results of RRT and LCTRS combined with PDA methods, respectively. The peak identification results in the red box indicated that Salvianolic acid B was incorrectly identified as Salvianolic acid L by the RRT method. Meanwhile, the two peaks of Salvianolic acid L and Salvianolic acid Y could not be identified due to the large tR deviation. Yet, LCTRS combined with the PDA method, accurately identified all peaks. Additionally, the green box revealed the tR deviation of each peak and the similarity of PDA. The blue box provided linear fitting results of tR. The yellow box showed the results of the PDA spectrum. The case suggested that LCTRS combined with the PDA method was superior to the RRT method.
The comparison results of tR from column 1 to 17 by the four optimized methods mentioned above are summarized in Table 6. For the number of positive columns (tR deviation≤tR window and/or PDA similarity≥similarity threshold), it was demonstrated that LCTRS combined with PDA method was the best, with the smallest average tR deviation, the highest identification rate, and the largest amount of available columns. However, LCTRS ranked the highest when only the chromatographic algorithm was used.
3.4 Sample tests
Considering the overlap of Salvianolic acid D peak and Salvianolic acid E peak in the chromatogram on columns 18-22, these columns were used for sample testing rather than method establishment. Three steps were included for sample testing. Firstly, the chromatographic and spectral data were introduced, and the peaks were integrated. Secondly, the reference compounds (peak 3 and peak 9) in the sample chromatogram were assigned. Thirdly, the results were obtained after running the method. The sample test results were exhibited in the same way as shown in Figure 7, which included the qualitative results of peaks, qualitative result tables, linear fitting results, and spectrum. The peak qualitative results on column Agilent TC-C18 (2) of the four methods are shown in Figure 8; Figure 8A shows the results of the RRT method, which had the smallest tR deviation of 0.110min. Nevertheless, Salvianolic acid B peak was unidentified; Salvianolic acid L peak and Salvianolic acid Y peak were incorrectly identified. Figure 8B shows the results of the LCTRS method, which had the second smallest tR deviation of 0.280min. Salvianolic acid L peak was correctly identified, but the Salvianolic acid Y peak was incorrectly identified. The RRT, combined with the PDA method (Figure 8C) and the LCTRS combined with the PDA method (Figure 8D) had the same identified results. As shown in figures, the Salvianolic acid L peak and Salvianolic acid Y peak were both correctly identified by the two methods. Still, the LCTRS, combined with the PDA method, had a smaller tR deviation of 0.293min. Table 7 shows a summary of the comparison results of the four methods established on five columns revealing that the RRT method was still the worst method with the lowest identification rate of 72.7%. On the other hand, LCTRS combined with the PDA method remained the optimal method with a smaller tR deviation of 0.240 min and the highest identification rate of 80.0%.
3.5 Column recommendation by database
In the study of the HPLC analysis method, a lot of chromatographic data on different columns are generally collected. However, only the information of column type, such as C18, is provided by the legal standard method. In contrast, data of the brand of the column or related chromatograms are not shown. Nevertheless, these data are indeed valuable, and differences between more useful data (such as with better separation effect, shorter separation time, smaller tR deviation, lower cost of the column) and common data are also meaningful. Therefore, based on the idea of big data, these available data were stored as a part of DRS and used for column recommendation.
Positive and negative columns were defined for column recommendation. Positive columns were referred to columns on which all peaks could be effectively separated and identified. Negative columns were columns on which some peaks could not be separated or identified. In this study, 11 compounds could not be effectively separated on column 21; therefore, this column was considered a negative column for all the four methods (Figure 8). Column 15 was a positive column for LCTRS combined with the PDA method (Figure 7B); however, it was negative for the RRT method due to the large retention time deviation of certain compounds (Figure 7A). For better analysis method reproducibility, future studies should choose the positive column instead of the negative one. For columns that are not on the list of positive or negative columns used, the results, chromatographic data, and PDA spectrum of the column are also meaningful. They can be applied to upgrade and improve the DRS method. Obviously, the positive or negative columns are distinguished for different medicines, different chromatographic conditions, and even for different peak identification methods for the same medicine. The list of the positive and negative columns for the phenolic acid extract of S. miltiorrhiza for the four methods is shown in Table 8, while more detailed information is presented on the software database.