At the time of writing, there are 33,377 unique biological sources (distinguished by species names), 122,776 unique molecules of natural products (distinguished by InChIKey) (21), and 898,294 relational data records included in NPBS database. The biological sources cover the diverse species of plant, bacterial, fungal and marine organism, the molecules have proper chemical structure data and computable molecular properties, and all the relational data have corresponding references. The entity relationship diagram of NPBS database is shown in Figure 3, other features of the current database are shown in following tables and figures.
The top 10 species of biological sources provide most natural products in NPBS database, as shown in Table 2, are all plants as expected. On one hand, literatures of phytochemistry are the majority of the publications we covered at present. On the other hand, terrestrial plants are the most abundant and accessible biological source on the earth, and human beings have a long history of taking plants as food, medicines and materials. Interestingly, the top 2 and the 9th species are fruits, other five species are used as seasoning and spice, each one provide over 700 natural products. Two traditional Chinese herbal medicines Artemisia annua and Hypericum perforatum have been demonstrated by modern science, their special constituents show significant antimalarial and antidepressant activity (22,23).
Table 2. The top 10 species of biological sources provide most natural products.
No.
|
Species name
|
Common name
|
Number of natural products
|
1
|
Mangifera indica
|
mango
|
1152
|
2
|
Vitis vinifera
|
grape
|
846
|
3
|
Rosmarinus officinalis
|
rosemary
|
790
|
4
|
Artemisia annua
|
sweet sagewort
|
797
|
5
|
Capsicum annuum
|
cayenne pepper
|
764
|
6
|
Ocimum basilicum
|
sweet basil
|
761
|
7
|
Hypericum perforatum
|
common St. John's wort
|
731
|
8
|
Foeniculum vulgare
|
sweet fennel
|
724
|
9
|
Psidium guajava
|
guava
|
723
|
10
|
Coriandrum sativum
|
coriander
|
723
|
The top 10 molecules of natural products derived from most biological sources in NPBS database, as shown in Table 3, are 8 terpenoids, 1 steroid, and 1 aliphatic acid, each one is derived from over 4,000 biological sources. Terpenoids are a large group of substances which occur in most organisms, playing vital roles of biofunctionality such as antioxidants and nutritions (24). The steroid “β-sitosterol” and the aliphatic acid “palmitic acid” also widely exist in organisms as important classes of bioorganic molecules (25).
Table 3. The top 10 molecules of natural products derived from most biological sources.
No.
|
Molecule
|
Name
|
Number of biological sources
|
1
|
|
α-pinene
|
6609
|
2
|
|
limonene
|
6064
|
3
|
|
β-pinene
|
5899
|
4
|
|
(-)-β-linalool
|
5730
|
5
|
|
p-cymene
|
5354
|
6
|
|
myrcene
|
5304
|
7
|
|
β-sitosterol
|
5112
|
8
|
|
palmitic acid
|
4871
|
9
|
|
(-)-terpinen-4-ol
|
4749
|
10
|
|
camphene
|
4636
|
The molecular features of the natural products in NPBS database, as shown in Figure 4, are perceived as chemically different from the molecules in other chemical databases (http://www.organchem.csdb.cn/). For the structural complexity, more than 86% of the natural products have ring system, over one third have more than 3 rings (Figure 4.A), and 56% of them are heterocycles (Figure 4.B). Approximately half of the natural products are aliphatic, 58% of them have chiral centers (Figure 4.B). The natural products also present extremely higher oxygen content on average, over 93% of them have oxygen atom, and the percentage of the natural products having more than 10 oxygen atoms reach 16%, it seems odd when compared with nitrogen content (Figure 4.C).
For the interest of taking natural products as starting points for medicinal chemistry and drug discovery, the Lipinski’s rule of five parameters may have significant referential value, and have insight into “drug-likeness” of the molecules in NPBS database (26). Over half of the natural products are within the bounds of Lipinski’s primary five parameters (Figure 5): molecular weight less than 500, number of hydrogen bond donors less than 5, number of hydrogen bond acceptors less than 10, number of rotatable bonds less than 10, and LogP less than 5. There is no doubt that the natural products are the treasure of potential drug candidates.
For evaluation of NPBS database, we carried out an experiment of comparison with other accessible natural product databases by searching for several common natural products and biological sources. The result as listed in Table 4, shows that NPBS database may be more applicable for searching wide biological sources of a specific natural product, and molecular properties of natural products derived from a specific biological source. For example, NPBS database shows 17 biological sources that contain natural product of “aconitine” , and 49 natural products derived from biological source of “Artemisia apiacea”.
Table 4. Comparison with NPBS and other accessible natural product databases.
Data resource
|
Result of searching natural products
|
Result of searching biological sources.
|
Super Natural II
|
Molecular properties
|
Not available
|
TCM Database
|
Molecular properties
|
Only some of Chinese Medicinal Herbs available
|
Reaxys
|
Substances and documents
|
Documents
|
TCMID
|
Not available
|
Only some of Traditional Chinese Medicines available
|
NuBBE database
|
Number of results
|
Number of results
|
TIPdb
|
No result
|
No result
|
NPBS
|
Molecular properties and biological sources
|
Natural products and molecular properties
|