Qualitative identification of lonicerae japonicae flos in traditional chinese medicine using metabarcoding combined with specific mini-barcodes

Lonicerae japonicae flos, also known as Jinyinhua (JYH), is an important component of traditional Chinese patent medicine (TCPM) products. However, the potential for adulteration and substitution with low-quality materials highlights the need for a reliable and sensitive approach to identify the species composition of TCPM products for consumer safety. We used universal ITS2 primers to amplify TCPMs containing JYH. However, the results were inconclusive, as only one operational taxonomic unit (OTU) was identified as Lonicera sp., which could not be identified at the species level. To confirm the species identification of Lonicera sp. in TCPM, we developed a short mini-barcode primer based on the psbA-trnH region, which, in combination with DNA metabarcoding technology, allowed for qualitative and quantitative analysis of artificially mixed samples. We applied the mini-barcode to distinguish TCPMs containing JYH and demonstrated its relatively accurate quantitative ability in identifying two Lonicera species. Our study presents a method for qualitative and quantitative identification of JYH, providing a promising application of DNA metabarcoding technology in the quality control of TCPM products.


Introduction
Lonicerae japonicae flos, commonly known as Jinyinhua (JYH) in the Chinese Pharmacopeia, refers to the dried flower buds or initial flowers of Lonicera japonica Thunb.Through extensive phytochemical studies, JYH has been found to contain a plethora of chemical components, including oils, organic acids, flavonoids, saponins, and others [1].
among the five species mentioned above, coupled with the higher market value of JYH, have led to widespread adulteration with Shanyinhua for economic gain, posing significant challenges for quality control [11].
In the case of decoction pieces, the morphological features and microscopic identification can effectively detect the adulterants [12].However, when the adulteration occurs in TCM, these methods cannot solve the problem of adulteration.Consequently, chemical methods such as fingerprinting have been proposed as more efficient alternatives to identify adulterants.For instance, Mentha canadensis Linnaeus and Mentha spicata Linnaeus in Xiaoyao Pills can be distinguished using fingerprinting [13].However, the specificity of fingerprints as identification approaches for JYH and Shangyinhua in TCMs is limited due to the need for high-specificity quality markers.Furthermore, chlorogenic acid and luteoloside are used as quality markers for JYH identification in the Chinese Pharmacopeia 2020 Edition, the content of chlorogenic acid in Shangyinhua is higher than that of JYH [14].Thus, there is an urgent need for more efficient techniques to identify the species composition of JYH-containing TCPMs.
Since Paul Hebert first proposed the concept of DNA barcoding in 2003 [15], this technology has been widely used for species identification of various organisms, including animals, plants, and fungi [16].In recent years, the emergence of DNA metabarcoding, a technology that combines high-throughput sequencing and DNA barcoding, has enabled the simultaneous detection of multispecies barcode sequences in mixed samples [17].This approach has been successfully applied to the discovery of biodiversity through environmental DNA [18], bulk DNA from multiple individuals [19], and stool DNA from various sources [20].In the field of TCMs, selecting appropriate barcodes is a critical step in DNA metabarcoding of TCPM.Universal primers have been widely used due to their comprehensive database and mature methodology.There have been reports on the application of DNA metabarcoding using universal primers to analyze biological components of herbal preparations, such as Liuwei Dihuang pills [21], Jiuwei Qianghuo Wan [22], and Wuhu San [23].
However, the DNA metabarcoding approach has been shown to have amplification preferences, leading to the identification of some species that are not contained in those TCPMs.Moreover, the length of the universal primer can cause DNA degradation in TCPMs, resulting in a low success rate of PCR amplification and increased preference of primers.To address these issues, mini-barcoding has emerged as a promising alternative to universal primers, overcoming amplification biases and enabling better species-level resolution [24][25][26][27].
In this study, we have employed a novel approach that combines metabarcoding technology with mini-barcode methods to perform qualitative and quantitative analysis of JYH in TCPMs.Our study also proposes a new strategy to address alterations in TCPMs, where metabarcoding with mini-barcode methods can significantly improve expertise and sensitivity in detecting adulteration.

Samples collection and preparation
To obtain the ITS2 marker sequence for TCPMs, a total of 29 commonly used herbal medicines containing JYH were collected (Table S1).These samples comprised 56 plant ingredients, and it was ensured that the ITS2 sequences of all labeled plants in the analyzed herbal products were available in the NCBI/GenBank database (http://www.ncbi.nlm.nih.gov/genbank) (Supplementary Table S2).
To test the effectiveness of their mini-barcode in identifying Lonicera japonica (JYH) and Lonicera macranthoides (a common species source of Shanyinhua), we purchased samples of Lonicera japonica and Lonicera macranthoides from the Anguo Medicinal Herb Market.These samples were further identified as Lonicera japonica and Lonicera macranthoides by Professor Xiaoxuan Tian at the Institute of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine.Subsequently, DNA was extracted from the Lonicera japonica and Lonicera macranthoides samples, and PCR amplification was performed targeting the psbA-trnH sequence.The PCR products were then sent to a sequencing company for Sanger sequencing.The sequencing results confirmed that the original species of the Lonicera japonica and Lonicera macranthoides samples were indeed Lonicera japonica and Lonicera macranthoides, respectively (Figure S2, S3).To evaluate the ability of their mini-barcode to qualitatively and quantitatively identify these two species, we prepared 5 artificial communities by mixing different proportions of the two species in 10 g increments (Table S3), and the minimum mixing amount of Shanyinhua is 10%.Ten TCPMs were then randomly selected from the original 29 samples for further testing using the mini-barcode, as marked by an asterisk in Table S1.

Development and evaluation of mini-barcode primers
To develop the mini-barcode for the identification of Lonicera species, we downloaded the psbA-trnH region nucleotide sequences of several Lonicera species from GenBank, including Lonicera japonica, Lonicera hypoglauca, Lonicera confusa, Lonicera macranthoides, and Lonicera fulvotomentosa.Using Mafft 7.0 software, we performed multiple alignments.Geneious software was used to select the high hypervariable regions as candidate regions for mini-barcode development.We designed primers using Primer Premier V6.0.The parameters were as follows: product size between 100 and 250 bp, primer size between 18 and 30 bp, melting temperature (Tm) between 40-70℃ and GC content between 20 and 70%.To ensure quality and specificity, we verified the primers using PrimerBLAST (http://www.ncbi.nlm.nih.gov/tools/primer-blast/) and evaluated their physicochemical properties using Oligo7 software, rejecting those with hairpin structures, primer dimers, or excessive annealing temperature.

DNA extraction, amplification, and high throughput sequencing
The total genomic DNA from 100 mg of each TCPM sample was extracted using the Plant Genomic DNA kit (Tiangen Biotech Co., Ltd., Beijing, China), with an extraction blank included for each DNA extraction.Additionally, 30 mg of material from each artificial sample was used for DNA preparation to further test the mini-barcode.The quality of the total DNA was assessed using 1% agarose gels and the Nan oPhotometer®spectrophotometer (IMPLEN, CA, USA).
The amplification of the target region for each sample was performed using primers [28] with unique tags (see Table S1 and S3 for details).For the 29 TCPM samples, the amplification was carried out using the universal primers ITS2F-ATGCGATACTTGGTGTGAAT and ITS3R-GACGCTTCTCCAGACTACAAT.Meanwhile, for the 5 artificial communities and 10 TCPM samples marked with an asterisk in Table S1, mini-barcode primers ptF1-GTCAATCTTCTTATTTTTAG and ptR1-ACAATTATA-GATAAGTCAGC were employed for amplification.To minimize the impact of stochastic errors that may arise during library construction and sequencing, the procedure for the 5 artificial communities and mini-barcode samples was repeated an additional time.
For each 50 µl reaction, the final PCR components included 1 µl of each primer, 1 µl Tks Gflex DNA Polymerase, 2 µl template, 20 µl dd-water, and 25 µl of 2 × Gflex PCR Buffer.The cycling protocol consisted of an initial denaturation at 94 °C for 60 s, followed by a maximum of 40 cycles of 98 °C for 10 s, 50 °C for 15 s, and 68 °C for 30 s, and a final extension at 68 °C for 5 min.To ensure accuracy, both PCR negative control and extraction blank control were included per amplification.The PCR products were then assessed on 1% agarose gels, and their concentrations were measured using the Qubit® DNA Assay Kit in the Qubit® 2.0 Fluorometer (Life Technologies, CA, USA).Subsequently, PCR products were pooled to equimolar concentrations to form three sequencing libraries.ITS2 amplification product in one library.The mini-barcode amplification product in another libraries and was repeated 2 times.Finally, the ITS2 amplification product was sequenced with on the Illumina Miseq platform (2 × 300 bp reads), and the mini-barcode amplification product samples libraries were sequenced on the Illumina Novaseq platform (2 × 150 bp reads).

Data analysis
To process the sequencing data, the reads from different ITS2 amplification products were separated based on tags using fastx_barcode_splitter from the FASTX toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). Paired-end reads were then merged using flash [29] and primer sequences were trimmed using Cutadapt [30].The resulting sequences were further processed using usearch10 [31,32] for denoising, quality control, and clustering of sequences with ≥ 99% similarity into operational taxonomic units (OTUs).Consensus sequences of the OTUs were queried using BlastN against the GenBank nucleotide database, and the output was imported into MEGAN COMMUNITY EDITION version 6.10.8 [33].The lowest common ancestor (LCA) parameters were set as follows: Min Score: 50.0,Max Expected: 0.01, Min Percent Identity: 100%, Top Percent: 10.0, Min Support Percent: 0.01, Min Support: 1, and LCA Algorithm: weighted.To minimize the impact of tag jumping, OTUs with reads that accounted for less than 0.01% of the total reads in each sample were removed.This ensured that only high-quality, reliable data were used for downstream analyses.The OTU table and the taxonomic information were then obtained.
The data of the 5 artificial communities and the minibarcode amplification product were split according to the tags using fastq-multx [34], and the paired-end reads were merged using QIIME [35].Denoising and quality control were performed using usearch10, and sequences with 100% similarity were assigned to each OTU.The remaining data processing methods were consistent with those described above.Finally, the relationship between the reads of Lonicera japonica and Lonicera macranthoides, as well as the proportion of real mixed samples in the 5 artificial communities, were analyzed.
2 and 4 plants in Qingguo Wan, Lingqiao Jiedu Wan, Lidan Pian, and Yinhu Ganmao San, respectively.While Fufang Yuxingcao Pian 2 and Fufang Yuxingcao Pian 3 samples did not detect any species in the labels (Table S6).To analyze the plant component proportions, the relative abundance of each plant in the sample was calculated based on the ratio of the number of reads per plant species to the total number of reads per sample.The proportion of plant components detected in each sample was significantly different from that in the TCPM ingredient list (Fig. 1).
We identified two OTU (OTU_142, OTU_58) as Lonicera sp. in the TCPM samples, as Lonicera japonica could not be identified at the species level.OTU_142 showed a 96.31% sequence similarity to both Lonicera japonica and Lonicera macranthoides in the GenBank database, which could not be identified at the species level and could only be identified as Lonicera sp.(Figure S1).Furthermore, Lonicera sp. was detected in only six samples with 195 reads, which differed from the expected proportion based on the TCPM ingredient list (Table S6).These findings suggest that ITS2 may not be suitable as an identification primer for Lonicera japonica.Additional investigation is necessary to confirm the species identification of Lonicera sp. in TCPM.

Evaluation of mini-barcode for species identification
A pair of mini-barcode primers, designated as ptF1-ptR1, was developed to amplify the psbA-trnH high variation region of JYH and Shanyinhua species, with a primer length of 20 bp and a mini-barcode fragment size of 108-129 bp (Table 1).The constructed Neighbor-Joining (NJ) tree based on the mini-barcode fragments revealed a clear distinction between JYH and Shanyinhua species (Fig. 2).

Qualitative identification capability of DNA minibarcodes
Five artificial communities and mini-barcode samples were amplified using our ptF1-ptR1 primers, resulting in 6,734,480 reads in library.Sequences with 100% similarity were then consolidated into two Amplicon Sequence Variants (ASVs) (Table S7, Table S8).Lonicera japonica and Lonicera macranthoides were identified with 100% similarity against their corresponding barcode regions in the GenBank database.These results demonstrate the successful amplification of target sequences of Lonicera japonica and Lonicera macranthoides from complex samples using our primers.To assess the quantitative ability of the minibarcode, the relationship between the reads abundance and biomass ratio of the two species in five artificial communities was analyzed (Fig. 3).A significant positive correlation

Analysis of TCPM by ITS2
In this study, a total of 22 samples were successfully sequenced from the TCPM, resulting in a sequencing success rate of 75.86% (Table S4).From these samples, a total of 61,329 reads were obtained, which were clustered into 125 OTUs.A total of 65 plant species were identified from these OTUs, representing 52 genera, with 29 OTUs identified only at the genus level (Table S5).For sample composition analyses in the TCPM, the relative abundance of each OTU was calculated as the value of each OTU divided by the total reads per sample, and the average relative abundance was then calculated as the total relative abundance of each OTU across all samples divided by 22 (Table S5).The top 10 plants in average relative abundance at the species level were Arctium lappa, Glycyrrhiza uralensis, Phaseolus vulgaris, Schizonepeta tenuifolia, Platycodon grandiflorus, Forsythia suspensa, Viola prionantha, Saposhnikovia divaricata, sophora japonica and Pistacia chinensis.Interestingly, among the top 10 plants, Phaseolus vulgaris, Sophora japonica, and Viola prionantha are not included in the TCPM label, yet they were found with high frequency in 95.46%, 77.27%, and 81.82% of the 22 samples, respectively.Based on literature reports [36], it is likely that Viola prionantha is being used as a counterfeit for Viola yedoensis in TCM.Additionally, we speculate that Phaseolus vulgaris and Sophora japonica may be used as an adulterant for Sojae Semen Praeparatum and Scrophularia ningpoensis, as their morphology is similar.These findings highlight the importance of utilizing molecular methods to assess the composition of TCM products, as they can uncover potential adulteration and provide valuable information for quality control and consumer safety.
Out of the 65 species identified in the TCPM, only 17 species were present on the product labels (Table S2), and 16 species were detected in the corresponding samples.Interestingly, some drugs such as Glycyrrhizae Radix et Rhizoma and Taraxaci Herba showed multiple sources, although all were authentic (Table S2).Additionally, Artemisia annua was found to be an introduced alien species, and was not listed in the corresponding sample of Yinhu Ganmao San.Of the 16 species on the labels of corresponding samples, some products showed varying degrees of plant diversity.For instance, 5 kinds of plants were detected in 11 products of Yinqiao Jiedu formula and 5 plants were detected in 2 products of Jingzhi Yinqiao Jiedu formula.In 2 products of Lianqiao Baidu formula, 7 and 8 kinds of plants were detected respectively.Moreover, Forsythia suspense was detected only in Fufang Yuxingcao Pian 1 of Fufang Yuxingcao formula.Additionally, There are detected 2, 4,

Discussions
Quality control from the source is a crucial aspect of TCPMs research, especially for herbal medicines with multiple sources [37].DNA metabarcoding has emerged as a powerful tool for identifying multi-source TCPMs [38].Combining this technique with mini-barcode allows for both qualitative and quantitative identification of specific species [39].
In our study, we employed DNA metabarcoding to identify the species origins of 22 TCPMs, revealing that 95% of these medicines contain multiple sources.Our analysis of Glycyrrhizae radix et rhizoma, for example, uncovered two different species: Glycyrrhiza uralensis and Glycyrrhiza glabra.Notably, Yinqiao Jiedu pills contained only Glycyrrhiza uralensis, whereas the other 21 TCPMs contained both species, with a higher reads proportion of Glycyrrhiza uralensis indicating a possible PCR or sequencing preference, or more widespread usage [40].Additionally, our analysis (R 2 = 0.9669) was observed between the reads rate and the species biomass proportion in the five artificial communities.

Application of mini-barcode in TCPM
The correlation coefficient equations for the 5 artificial communities in Fig. 3 were used to infer the true proportions of these two species in TCPM samples (Fig. 4).Among the results in Fig. 4, JYH accounted for more than 99% in 8 samples.Notably, in Yinqiao Jiedu Pian and Niuhuang Qinggong Wan samples, Lonicera macranthoides accounted for 4.74% (5.47%) and 18.79% (11.90%) respectively, suggesting the presence of adulteration in these two samples.This improved the success rate of our experiment from 60 to 100%, and allowed us to obtain quantitative results for Lonicera sp. in 10 TCPMs.Specifically, eight of these herbal medicines contained less than 1% of Lonicera macranthoides.Regarding this result, we first excluded the possibility of contamination during the experiment, as we included negative controls for both DNA extraction and PCR amplification steps.In addition, we checked for cross-talk [41], which refers to the phenomenon of a read is assigned to an incorrect sample, with a rate of approximately 1.6% in many Illumina datasets [42].Therefore, we think that there is no adulteration in these eight TCPMs.
The results of this study highlight the importance of quality control for TCPMs and provide valuable insights into the use of DNA metabarcoding and mini-barcode for identifying multi-source TCPMs and specific species.These technologies can overcome some of the shortcomings associated with traditional identification methods, such as subjective judgment and species too similar in appearance to be distinguished, thus improving the accuracy and reliability of identification.By using DNA metabarcoding and mini-barcode, we are able to better understand the sources and components of TCM formulations, providing better assurance for their quality and safety.
Regarding Lonicerae japonicae flos, ITS2 sequence was only detected in six samples with 195 reads.We speculate that the primer preference, rather than DNA degradation, was the main cause for the phenomenon.For instance, in Lidan tables, the ratio of content between Lonicera sp. and Astragalus sp. is 2:1, but the ratio of sequencing reads was just 1:126.To address this issue, we designed a specific mini-barcode based on the universal psbA-trnH [39,

Fig. 1 Fig. 3 Fig. 2
Fig. 1 Proportion of plant ingredients identified in the TCPM label.The left histogram displays the biomass proportion of components detected in the TCPM ingredients list, while the right histogram repre-

Fig. 4
Fig. 4 The actual proportion of Lonicera japonica and Lonicera macranthoides based on linear relationship

Table 1
Mini-barcode primers information