We plotted the annual publication volume of a total of 23513 literatures from 1959-2020 with Tableau plotting tool (as shown in Figure 1). From the figure the following findings have been drawn.
1.Before the 1990s, it showed a low volume of published literature, a slow growth rate, and a relatively flat trend of literature growth.
2.According to the dashed line of the literature publication trend and the axis of the mean value, it can be seen that 2000 was a turning point of literature growth. From 2000 to the present, there has been an explosive growth in the number of literature with an increasing growth rate. Especially, an exponential growth trend was shown after 2010.
3.1 Subheading analysis
We statistically counted all metformin-related subtitles from 1959-2020 and found that the five subtitles with the highest percentage were TU (therapeutic use), PD (pharmacology), AE (adverse effect), AD (administration and dose), and AA (analogues and derivatives). Their percentages at each time period are shown in Figure 2. We observed that TU (therapeutic use) and PD (pharmacology) both showed an overall trend of decreasing and then increasing, while AE (adverse effect) showed a trend of increasing, then decreasing and then increasing. For AD (administration and dose) and AA (analogues and derivatives), although there were also some variation trends in each period, the overall trend was relatively flat.
3.2 Statistical analysis for inspection tag
Based on a time period of 5 years, we calculated the percentage of the frequency of the top 9 test labels (The top 9 test labels were taken in consideration of the fact that the percentage of other labels in the total literature frequency was less than 1% for most time periods and did not show significant trends and characteristics) in the frequency of total literature test labels for each 5-year period with Tableau plotting tool (as shown in Figure 3). An analysis of the test labels assigned to the literatures provided information about the general aspects of the studies over 60 years. (Data for 1959-1960 were excluded because there were only three papers in total for these two years, which are not representative; if included, they might generate some noise and have some impact on the overall model)
1. The label “Human” has always been dominant but the percentage of it has been showing a decreasing trend from 1961 to 1978 and decreased from 20% to 12.65%, and gradually showed an overall increasing trend only since 1978.
2. The percentages of the labels “Male”, “Female”, “Middle Aged”, “Adult” and “Aged” all showed an increasing trend from 1961 to 1972, but all showed a decreasing trend during 1972-1978.
3. The percentage of “Male” label was always higher than that of “Female” label during 1972-1997, while the percentage of “Female” label was always higher than that of “Male” label during 1997-2020.
4. The label “Clinical” has shown an overall increasing trend after 2000 compared with that before 2000.
3.3 Correspondence analysis of year and subtitle
We used the statistical analysis software SPSS Statistics 26 to perform a corresponding analysis of the subtitles of metformin, aiming to explore the development pattern of metformin more further. In SPSS, the default data requires each row to be a single case, and since the two-dimensional table is a cross-summary of two categorical variables, after converting it to a one-dimensional table, each row of data would still correspond to the number of cases summarized from the categorical variables. Therefore, we have to perform case weighting for the frequency of the subtitles first. We selected the first two dimensions obtained from the analysis (explaining a total of 63.8% of the inertia, as shown in Figure 4). It reflects the relationship between years and subtitles. The first dimension explains 51.9% of the inertia and the second dimension explains 11.9% of the inertia. The red circles in the figure denote years, the blue circles denote subtitle names, and the horizontal and vertical coordinates of each point denote its score on each dimension (i.e., contribution of the dimension). The chi-square distance was used for the measurement of the distance between two years or two subtitles. The Euclidean distance between their outlines was used for the measurement of the distance between two years, and the inverse of the quality was used to weight each component of the index term (or relative frequency). In the same dimension, the closer the category distances of the same variable, the higher their correlation. From the figure, we can see that the nodes of years and subtitles are almost distributed at two sides of the function Y=X, and most of the nodes distributed above Y=X are year nodes, while most of the nodes distributed below Y=X are subtitle nodes. The closer the total distance (i.e., the minimum straight-line distance) of a year node to Y=X and a subtitle node to Y = X, the higher the correlation of the two nodes. Among them, the 6 subtitles of pharmacokinetics, surgery, therapy, pharmacology, physiology, and therapeutic use are the nodes closest to Y=X, while the year nodes of 1960, 1961, 2005, 2006, 2009-2013, and 2015-2020 are the nodes closest to Y=X. By comparing them collectively, it can be found that the closest year nodes to the subtitle pharmacokinetics are 1960, 1961, 2005, and 2006; the closest year nodes to the subtitle therapy and surgery are 2009-2013; and the closest year nodes to the subtitle pharmacology, physiology, and therapeutic use are 2015-2020, indicating a strong correlation between these subtitles and years. For further presenting the correlation between years and the correlation between subtitles, we also performed corresponding analysis for years and subtitles respectively (as shown in Figure 5 and Figure 6), and cluster analysis for years (as shown in Figure 7). From Figure 4, it can be observed that the year nodes before the 21st century are generally distributed in the negative half-axis of the second dimensional factor axis (i.e., below the green solid line), and the nodes are distributed in a relatively scattered manner. Comparatively, the year nodes after the 21st century are distributed more concentrated and generally have a higher contribution to the model compared to the year nodes before the 21st century, indicating that the 21st century is an important turning point for the research on metformin. From Figure 5, it can be seen that the four subtitle nodes of rehabilitation, transmission, pathogenicity, and urine are distributed relatively marginally on the model, but except for the urine node, which has a low contribution in both dimensions, the transmission, pathogenicity, and rehabilitation nodes shows high contributions in both the first and second dimensions, indicating that these subtitle nodes are all high-quality nodes although they exhibit some differences from the other subtitle nodes. Considering keeping the amount of inertia carried by each historical period to be as relatively even as possible, we used the clustering tool gCLUTO to perform cluster analysis for all time nodes from 1959-2020 (as shown in Figure 6), and eventually limited the number of clusters to 5, i.e., 1959-1998, 1999-2005, 2006-2009, 2010-2013, and 2014-2020 (The larger span of the first cluster 1959-1998 is due to the relatively few publications before the 21st century, and further clustering would result in clusters carrying even less inertia). According to the figure of the cluster peaks, it can be noticed that the peak volume of the cluster 1959-1998 is rather small and the color variation within this cluster is large, suggesting that the information inertia carried in this period is rather small, which also indicates that the research on metformin in this period was still in the initial stage, and there were large variations among the research themes. In comparison with the cluster 1959-1998, the peak volume of the 1999-2005 cluster has increased, and the color variation within the cluster shows that there has been stronger linkage among research themes in this period. And the peak volumes of the three clusters of 2006-2009, 2010-2013 and 2014-2020 have significantly increased compared with the previous two clusters, and the colors within the clusters are more uniform, illustrating that the research on metformin has significantly increased in recent years with a continuous expansion in the depth and breadth of the research content.
Next we counted the top five subheadings in frequency ranking under each cluster (i.e., each historical period), and the percentage of these subheadings occupied internally and globally (as shown in Table 1). Through Table 1 we found that the studies of metformin in 1959-1998 mainly focused on several aspects: UR (urine), PO (toxicity), NU (nursing), PHY (physiology), TU (therapeutic use). The 1999-2005 study focused on several aspects: RE (rehabilitation), PHY (pathophysiology), VE (veterinary), PO (toxicity), and ST (standard). From 2006 to 2009, research focused on RE (rehabilitation), PO (toxicity), VE (veterinary), PHY (pathophysiology), and PHA (kinetics). The 2010-2013 study focused on several aspects: NU (care), VI (virology), ST (standard), MO (mortality), and OA (organization and management). The 2014-2020 study focused on several aspects: TR (transmission), PA (pathogenicity), PARA (parasitology), TRAN (transplantation), MI (microbiology).
Table 1 Cluster analysis for 1959-2020 years
Description of five research periods based on the study of Metformin subheadings
|
Prob.
|
Clusters/Subheadings
|
%Global
|
Internal freq.
|
Global freq
|
|
Cluster 1. Years 1959-1998
|
|
|
|
urine
|
10.61
|
19
|
179
|
0
|
poisoning
|
9.28
|
9
|
97
|
0.002
|
nursing
|
6.25
|
1
|
16
|
0.004
|
physiology
|
5.59
|
814
|
14571
|
0
|
therapeutic use
|
5.49
|
852
|
15523
|
0
|
Cluster 2. Years 1999-2005
|
|
|
|
rehabilitation
|
20.51
|
8
|
39
|
0.006
|
physiopathology
|
15.95
|
351
|
2200
|
0
|
veterinary
|
14.81
|
4
|
27
|
0.007
|
poisoning
|
14.43
|
14
|
97
|
0.004
|
standards
|
12.67
|
28
|
221
|
0
|
Cluster 3. Years 2006-2009
|
|
|
|
rehabilitation
|
30.77
|
12
|
39
|
0.002
|
poisoning
|
22.68
|
22
|
97
|
0.004
|
veterinary
|
22.22
|
6
|
27
|
0.01
|
physiopathology
|
16.27
|
358
|
2200
|
0
|
pharmacokinetics
|
15.06
|
155
|
1029
|
0
|
Cluster 4. Years 2010-2013
|
|
|
|
nursing
|
43.75
|
7
|
16
|
0.008
|
virology
|
30.00
|
21
|
70
|
0.003
|
standards
|
27.15
|
60
|
221
|
0
|
mortality
|
23.01
|
180
|
782
|
0
|
organization and administration
|
22.71
|
186
|
819
|
0
|
Cluster 5. Years 2014-2020
|
|
|
|
transmission
|
100
|
4
|
4
|
0.007
|
pathogenicity
|
96.67
|
29
|
30
|
0.01
|
parasitology
|
91.67
|
11
|
12
|
0.04
|
transplantation
|
83.33
|
10
|
12
|
0.006
|
microbiology
|
82.50
|
198
|
240
|
0
|
3.4 Correspondence analysis between indexing term joint subtitle and year
Analyzing indexing terms in association with their subtopics can reveal more concrete information about textual implication than using indexing terms alone. We conducted a year-by-year statistical analysis of all indexing terms for the 62 years from 1959 to 2020 and discovered that there were 219 commonly used indexing terms with 35,819 occurrences. The frequency distribution of indexing terms is highly skewed, with 61% of indexing terms occurring only once (incidental words). To avoid excessive noise, we did not take into account incidental words that appear only once in our analysis. To avoid excessive noise, we did not take into account incidental words that appear only once in our analysis. We analyzed the corresponding 219 indexing terms, and the first two dimensions explained a total of 82.1% of the inertia (as presented in Figure 8). Since the quantity of indexing terms is exceedingly large, we only show the top 30 indexing terms that contribute in the first two dimensions in order to avoid the overlapping display of indexing terms. Meanwhile, as the five indexing terms, namely Metformin TU, Hypoglycemic-Agents TU, Insulin TU, Diabetes-Mellitus-Type2 DT, and Metformin PD, accounted for a relatively high percentage of the global text (as presented in Table 2), it might result in the analysis of other indexing terms being affected by extreme values. And all five indexing terms characterized the role of metformin with respect to hypoglycemia. This application has been studied throughout the development of metformin and has been generally recognized. Our primary goal was to explore the application of metformin in fields other than glucose-lowering, so we excluded these 5 indexing terms from the corresponding analysis and ultimately manifested only 25 indexing terms. In Figure 7 we can observe the relationship between the joint subtitle of the indexing terms and the year.
We can see that the first dimension interpreted a total of 54.9% of the inertia while the second dimension interpreted a total of 28.2% of the inertia. In a chronological order from far to near we discovered that the Anticholesteremic-Agents PD node is closer to the year node of 1970-1980, indicating a higher concentration of pharmacological studies on acidosis with biguanides carried out during this period; the Phenformin AE node is relatively close to the year node of 1976-1980, illustrating that the research on the adverse effects of phenformin was more concentrated in that period; the Obesity DT node is closer to the year node of 1980-1990, indicating that the research on metformin in the treatment of obesity has made some advancements in this period; the nodes of Myocardial-Infarction DT, Arteriosclerosis DT, Thromboembolism PC, and Coronary-Disease DT related to cardio-cerebrovascular disease are closer to the year nodes of 1997-2000, suggesting new advances regarding metformin in the treatment of cardio-cerebrovascular disease in that period; Polycystic-Ovary-Syndrome DT node is closer to the year node of 2000-2005, indicating new research on metformin in the treatment of polycystic ovary syndrome in that period; Neoplasms DT, Gastrointestinal-Microbiome DE are closer to the year nodes of 2008-2014, demonstrating new advancements in research on metformin in anticancer and gut microbiome composition in that period; Thiazolidinediones TU, Sitagliptin-Phosphate AD, Linagliptin AD, Canagliflozin AD, Exenatide TU, Liraglutide TU, and Glipizide AD are closer to the year nodes after 2000, with this performance being more evident especially in 2015-2020. On the basis of observations, we found that these nodes have a remarkable commonality with metformin in that both of them are hypoglycemic agents. The research focused on therapeutic use and administration and dosage, which could be substitutes for metformin. In this regard, can we raise the question that will these new glucose-lowering drugs have an impact on the position of metformin in the field of glucose-lowering? In the conclusion part we will elaborate on this issue. Anti-Inflammatory Agents PD node is closer to the year node of 2000-2020, demonstrating the advancements in pharmacological studies on metformin in anti-inflammatory aspects in that period; the Longevity DE node is closer to the year node of 2015-2020, indicating advancements on the application of metformin in delaying aging; the AMP-Activated-Protein-Kinases ME nodes are all close to the 2008-2020 year nodes, indicating that the hypoglycemic mechanism about AMPK from 2008 to the present is the highlight and hot spot of the current and even future time research.
Table 2 Frequency and percentage of the top 20 indexed term joint subtitles in all metformin literatures
Most frequent indexing terms in Metformin documents
|
Descriptors
|
Freq.
|
% Doc.
|
Metformin TU
|
2617
|
7.31%
|
Hypoglycemic-Agents TU
|
2194
|
6.13%
|
Insulin TU
|
1936
|
5.40%
|
Diabetes-Mellitus-Type2 DT
|
1717
|
4.79%
|
Metformin PD
|
1494
|
4.17%
|
AMP-Activated-Protein-Kinases ME
|
693
|
1.93%
|
Neoplasms DT
|
576
|
1.61%
|
Polycystic-Ovary-Syndrome DT
|
504
|
1.41%
|
Gastrointestinal-Microbiome DE
|
467
|
1.30%
|
Acidosis PD
|
428
|
1.19%
|
Obesity DT
|
404
|
1.13%
|
Coronary-Disease DT
|
393
|
1.10%
|
Myocardial-Infarction DT
|
361
|
1.01%
|
Thromboembolism PC
|
348
|
0.97%
|
Arteriosclerosis DT
|
316
|
0.88%
|
Anticholesteremic-Agents BL
|
240
|
0.67%
|
Anticarcinogenic-Agents TU
|
236
|
0.66%
|
Diabetic-Retinopathy DT
|
228
|
0.64%
|
Intestinal-Absorption DE
|
212
|
0.59%
|
Longevity DE
|
207
|
0.58%
|
Thiazolidinediones TU
|
192
|
0.54%
|
Sitagliptin-Phosphate AD
|
183
|
0.51%
|
Canagliflozin AD
|
176
|
0.49%
|
Linagliptin AD
|
172
|
0.48%
|
Liraglutide TU
|
163
|
0.46%
|
Exenatide TU
|
154
|
0.43%
|
Acarbose AD
|
153
|
0.43%
|
Phenformin AE
|
142
|
0.40%
|
Glipizide AD
|
140
|
0.39%
|
Anti-Inflammatory Agents PD
|
136
|
0.38%
|
Only 25 indexing terms are presented due to the problem of considering the clarity of the information presented in the corresponding analysis figures. If clarity is neglected to show more indexing terms, it is expected that more applications of metformin in other fields can be further explored.