Global Research on DNA methylation from 2000 to 2019: Bibliometric Analysis and Visualization

Xuan Su Department of Sport Rehabilitation, Shanghai University of Sport, Shanghai, China Lin-Man Weng Department of Sport Rehabilitation, Shanghai University of Sport, Shanghai, China Kang-Yong Zheng Department of Sport Rehabilitation, Shanghai University of Sport, Shanghai, China; Xue-Qiang Wang (  wangxueqiang@sus.edu.cn ) Department of Sport Rehabilitation, Shanghai University of Sport, Shanghai, China;


Introduction
More recently, an increasing number of researches focus on the development of DNA methylation, which is involved in a large quantity of knowledge about biological information. Currently, the most common epigenetic feature in human somatic cells is DNA methylation (1). Adding a methyl groups to the c5 (5meC) cytosine-followed-by-guanine dinucleotide (CG or CpG site) eventually leads to DNA methylation. It has the characteristics of stability and heredity (2). DNA methylation is a dynamic process that occurs in a temporal, spatial, and cell type-speci c manner. The removal of methylation can be done either passively or actively, with the ultimate function being to respond to environmental stimuli (3).
The impact of methylation in biological information is enormous. We are just beginning to study the role of methylation in changing the activity of enhancers, insulators, and other regulatory elements (4).
However, it has been well studied in transcriptional inhibition, genomic imprinting, dynamic biological process and selective splicing of pre-mRNA (5)(6)(7). DNA methylation regulates genes that are highly involved in memory formation and precisely regulates cognitive function in humans (8). Studies of DNA methylation continue to demonstrate the richness and complexity of epigenetic gene regulation in the central nervous system and provide potential therapeutic targets for the treatment of neuropsychiatric disorders (9). DNA methylation can be used as a biomarker for aging (10)and cancer (11) as well as for predicting age (12) and participate in pathophysiological in disease (e.g., multiple sclerosis) (3,13).
The study of DNA methylation has attracted researchers and academic journals' much attention in recent ten years, however, few published articles used systematic analysis to report the trends of scholarly production in this eld. Bibliometric study is a quantitative method of published articles, which has been widely used to study trend in research activity over time (14)(15)(16). At present, bibliometric study has also been used for assessing the trends of plenty of genes and diseases, for example, long noncoding RNA (17), miRNAs and lncRNAs, exosome (18,19), cancer (20,21), pain (22)(23)(24),heart disease (25,26) , obesity (27,28).
The purpose of our study is to systematically assess the trends of DNA methylation research from 2000 to 2019. The CiteSpace V (Drexel University, Philadelphia, United States) is used to conduct a bibliometric study in Web of Science Core Collection, which is one of the frequently-used bibliometric method tools to analyze the trends in scienti c research (16,29). In our study, the trends of the DNA methylation research involve the number of published articles, the collaborations between countries/ institutions/ authors, cocitation analysis of authors and references, a citation-burst analysis of keywords (15,30). Our ndings wound offer valuable information to scienti c researchers, funding agencies and policy makers.

Sources of the data
Publications are limited to be between January 1, 2000 and December 31st, 2019. The data were downloaded and extracted from the Science Citation Index Expanded (SCI-Expanded) of Web of Science Core Collection, and used as secondary analysis data.
The following words were used: Title = (methylation) and Title = (DNA or Deoxyribonucleic acid) AND Language: (English).

Inclusion criteria
First, regarding manuscript types, we only included peer-reviewed reviews and articles and excluded conference presentations, conference abstracts, case reports, expert opinions and editorials. Second, language was limited to English. Third, we included studies that were related to DNA methylation. No species restrictions were considered. The resultant expanded dataset comprises 11,127 records, which were reviewed on 19 Jan 2020.

Data extraction
The authors (Xuan Su) independently extracted the published papers, and made use of EndNote (EndNote X7, Bld 7072, Thomson Research Soft, Stamford) to conduct the downloaded publications. The txt data were imported into CiteSpace V, which were downloaded from SCI-Expanded.
The included papers of this study were recorded the following characteristics, such as journal, institution, country. And we extracted these information (for example, publication count, citation frequency, H-index, essential science indicators top papers) as bibliometric indicators. H-index refers to the fact that a researcher or research institution has H papers that have been cited at least H times respectively (31). For example, the H-index of a certain researcher is 20, which indicates that among his published papers, a total of 20 papers have been cited more than 20 times. The higher the H-index, the greater the in uence of the papers. Essential science indicators (ESI) (32) top papers include hot papers (last two years) and highly cited papers (last ten years) for 22 broad subject categories in ESI database of Web of Science.
According to the included studies, we would choose and analyses the top 15 subject categories of Web of Science categories. Research areas are classi ed into ve broad categories in Web of Science Categories, and there are 225 subject categories in the ve broad categories, such as cell biology, oncology, physiology.

Statistical methods
The Microsoft Excel 2020 was utilized for analyzing and predicting the publications of DNA methylation.
We used the predicted model (f(x) = ax3+bx2+cx+d) to calculate the trends of publications and to forecast the cumulative volume in the crosstalk between DNA methylation. The symbol x indicated the year, and f(x) denotes the quantity of published papers by the year. The Microsoft Excel 2020 was also used for drawing the world map about the distribution of papers on DNA methylation. CiteSpace is an appropriate choice of method to perform a bibliometric analysis, facilitating the analysis of the trends and patterns in a knowledge structure. The CiteSpace V were used to 1) analyse the distribution by journals, years, countries, authors and institutions, 2) evaluate the collaborations between countries, institutions and authors, 3) put through analysis of citations, H-index, ESI top papers, 4) analyze the reference and keywords to predict future research trend. To analyses whether the number of articles published each year signi cantly increased or decreased over time, we used the linear regression analyses with the year as the independent variable and publications as the dependent variables. The linear regression analysis was conducted with IBM SPSS Statistics 22.0 software (SPSS, Inc, Chicago, USA). When P value is less than 0.05, it is considered statistically signi cant.

Publication outputs and growth trends prediction
A total of 11,127 publications met the inclusion criteria. The characteristic of annual published papers was demonstrated in different periods ( Figure 1A). In general, the research interest on DNA methylation has slowly gone up in the past 20 years, as the publication number raised from 101 publications in 2000 to 1273 publications in 2019. Figure 1B Figure 3, in terms of co-citation coupled with centrality, the Nature, Cell, P Natl ACAD SCI USA, Nat Genet, Nucleic Acids Res, and Genome Biol are the most in uential in the association of DNA methylation research.

Distribution by countries and institutions
A total of 11,127 papers on DNA methylation research were published in 107 countries/territories. A widespread collaboration between countries/territories was showed in gure 4A. Among the top ten countries (Table 2)

The distribution of authors
A total of 44633 authors devoted to 11,127 published papers. The cooperation between authors were outlined by network map ( Figure 7A). The Table 4 showed the top ten authors who had the most published papers. Among the top ten authors, ANDREA A BACCARELLI had the most published papers (66), followed by MOSHE SZYF (46), MANEL ESTELLER (42), and CARMEN J MARSIT (42). A network map was presented for information on authors citation by using CiteSpace V ( Figure 7B). In the midst of the top 10 co-cited authors (Table 4)

Analysis of references
Reference analysis was considered as a signi cant indicator in bibliometrics study. Figure 8 a showed the map for co-citation of references, suggesting the association of the published studies on DNA methylation research. The modularity Q value was 0.8963 (greater than 0.5), indicating that it is reasonable to divide the co-cited graph into coupling clustering. All clusters were traced by index terms extracted from the references. As shown in the gure, we can see that labeled as the largest cluster is "speci c DNA methylation", and labeled as the second largest cluster #1 is "DNA methylation dynamics". "non-conventional DNA methylation" was labeled the third largest cluster, and nally "lung cancer" was labeled the fourth largest cluster. A timeline view was showed for the top 12 clusters ( Figure 8B).

Analysis of keywords
CiteSpace V was used to extract keywords from the 11,127 published studies. The top 77 most-cited keywords are delineated in Figure 10. The top ve keywords with strongest citation bursts were as  Figure 9 shows the keyword co-occurrence network diagram of DNA methylation research from 2000 to 2019.

Characteristics of top 10 papers with the highest citation frequency
The top ten papers on DNA methylation scienti c research with the most citation frequency was put on the Table 5. The article by Bird A et al. come out in 2002 was the most cited (4205 times) paper, was entitled "DNA methylation patterns and epigenetic memory." published in GENE DEV. In the thick of the top 10 papers, six were bring out in journals with impact factor ≥ 40 (NAT REV GENET , NATURE), one in journals with 20 ≤ Impact factor<40 (NAT GENET), two in journals with 10 ≤ Impact factor<20 (TRENDS BIOCHEM SCI, GENOME BIOL), one journal with1 ≤ Impact factor<10 (GENE DEV).

Global trends in DNA methylation research
Over time, the tendency in global publications has keep on rising in DNA methylation research. According to the observed increase of publications on DNA methylation research, our ndings disclosed two periods in publication year. The rst period (2000)(2001)(2002)(2003)(2004) was considered as a germination period of the DNA methylation research. Although the rst paper on DNA methylation research was published in 1963, these papers systematically describe the process of enzyme methylation of RNA and DNA (33,34). This is enough to prove that the topic is very research-oriented. In this period, the number of publications contributed to 6.36% of the total number of publications on DNA methylation research. Over time, it has grown by almost twice every ve years. The second period (2015-2019) could be regarded as a golden period for DNA methylation research, as 50.84% of papers were published during this period. Though the H-index has fallen by nearly half since the previous ve years, but citations in 2019 are the highest. Among the top 10 countries, the United States has made outstanding contributions to DNA methylation research, with the highest number of papers (4,263), citations (23,5449), citations per paper, H-index (216), and the highest number of ESI top papers (117). Among the top 10 countries, there were two Asia-Paci c countries, ve European countries, one American country and one Oceania country. In addition, based on the number of publications, we can see that eight organizations were from North America (the United States has seven) and two were from Asia Paci c countries. It can be seen that American research in this eld has a certain in uence.
Researchers interested in this eld can pay more attention to the research di. rection of relevant institutions in the United States.

Emerging trends of DNA methylation research
In accordance with the latest Web of Science category, the hot research areas among 11,127 papers chie y concentrated on Genetics Heredity, Biochemistry Molecular Biology and Oncology for the DNA methylation research. As shown in gure 6, in terms of the number of publications Genetics Heredity (2516 publications) led the rst research echelon, followed by "Biochemistry Molecular Biology" (2286 publications), "Oncology" (1836 publications), "Cell Biology" (1148 publications), and "Multidisciplinary Sciences" (1063 publications).
On the basis of the co-citation map of references, we noticed that "speci c dna methylation" was labeled as the largest cluster, followed by "dna methylation dynamics", "non-conventional dna methylation", and "lung cancer". Ambrosi C et al. showed that DNA methylation can directly affect gene expression patterns and cell recognition, linking promoters to gene activity dynamics (35). Pitto L et al. suggested that the range and environment of methylation might help regulate gene expression (36). Mehta A et al. showed that DNA methylation is the most intensively studied epigenetic marker of human cancer, and that some cancers occur and develop as a result of interactions between permanent genetic and dynamic epigenetic changes (37).
Among the top 10 most cited papers, the paper wrote by Bird A et al. published in Gene Dev was the most cited paper (4205 times), which focus on DNA methylation, involving not only the generation of methylation patterns in mammalian genomes, but also genetic and biological knowledge (8). Jones P et al. published the paper entitled "Functions of DNA methylation: islands, start sites, gene bodies and beyond." in NAT REV GENET, they reveal we can evaluate DNA methylation in different genomic contexts: transcriptional start sites with or without CpG islands, in gene bodies, at regulatory elements and at repeat sequences (4).
The burst keywords, which break out frequently and appear within a certain period, are thought to be the indicators of frontier topics or jumped-up trends (38). CiteSpace can be used to detect burst keywords. Figure 10 shows the top 77 keywords, which with the strongest bursts in their appearances, from 2000 to 2019. By the end of 2019, the most recent burst keywords were as follows: "brain"(2014~2019), "stem cell"(2014~2019), "epigenome wide association"(2015~2019), "oxidative stress"(2016~2019) , "in ammation"(2016~2019), "methylome"(2016~2019), "pregnancy"(2017~2019), "obesity"(2017~2019), "growth"(2017~2019). Among them, the keyword "blood", which emerged starting from 2015, has the strongest citation burst (42.1367). We observed that the research related to DNA methylation is more deeply studied and more widely developed. We found that different DNA methylation in the brain is associated with many psychiatric diseases (39), and not only that, it also plays an important role in stem cells pluripotency and differentiation (40). From 2017 to 2019, the key words that started to explode, We have learned that blood DNA methylation during pregnancy has been shown to be sensitive to air pollutants and to affect the health of the fetus (41). DNA methylation is used to explain genetic variability in obesity (42). Some evidence suggests that DNA methylation is associated with (I) changes in lipid and glucose metabolism, (ii) diabetes and (iii) size and composition in children (43) .Hence, further DAN methylation research are needed to focus on these keywords.

Strengths and limitations
We are the rst research team to analyze on the trends of the DNA methylation research from 2000 to 2019 in SCI-Expanded of Web of Science through visual analysis. Furthermore, the trends analysis of the DNA methylation research include the quantity of publications, academic journals, the distribution and collaborations between countries/ institutions/ authors, H-index and ESI papers, co-citation analysis on authors and journal, analysis of keywords, which could offer valuable information for DNA methylation researchers to discover new areas connected with companions, cooperative institutions, research frontiers.
There are some limitations in this study. Only SCI-Expanded of Web of Science is searched and analyzed, and the other electronic database such as PubMed, Scopus, Cochrane library, Embase, are not searched and analyzed. Besides, non-English publications, which were few in number and may not change the conclusions, were excluded during retrieval. This study focuses on quantitative analysis but less qualitative analysis. Another limitation is that in uential papers were not quoted with frequent times. Because some potential in uential papers would be cited with high citation frequency until the ndings were well known. Other in uential and relevant papers could be published recently, may not be quoted with high citation frequency.

Conclusions
Here, for the rst time, we performed a bibliometric analysis of the scienti c literature on DNA methylation. From 2000 to 2019, we conducted quantitative and qualitative analyses of global scienti c papers on these topics. This will help researchers understand the trends of DNA methylation. The top three countries that contributed to the largest number of publications were USA, Peoples R China, and Germany, Plos One, Epigenetics, and Clinical Epigenetics were the top three journals engaged in DNA methylation research. The most proli c institution is Chinese Acad Sci, followed by Harvard University, Johns Hopkins Univ. We can choose these institutions for Cooperation and communication. The latest research frontier may be "pregnancy", "obesity", and "growth". The development trends in the applications of DNA methylation as revealed by the hot topics and research frontiers discussed in this study. The related studies may pioneer this eld in the next few years and may help researchers to identify new directions with renewed focus. In conclusion, this study provides an insight into DNA methylation and valuable information for researchers to identify new perspectives concerning potential collaborators and cooperative institutions, hot topics, and research frontiers. Competing interests: All authors declare that there are no con icts of interest.
Author Contributors: XQ. Wang designed the study. All authors whose names appear on the submission contributed substantially to the acquisition of data, or the analysis and interpretation of the data. Xuan Su wrote the rst draft of the manuscript, and all authors commented on previous versions of the manuscript. All authors read and approved the nal manuscript. World map of total country output based on DNA methylation research. Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.