Bibliometric analyses are the quantitative evaluation of a corpus of scientific work, based on bibliographic and article meta-data. In this study we apply keyword co-occurrence and bibliometric coupling, in order to identify the structure research on the COVID-19 pandemic. The results are subsequently mapped visually (4). Science mapping based on bibliometric analysis is composed of several steps, literature search, cleaning and analysing the bibliographic data, before the results are visualized and and interpreted (3). We further apply social network analysis to analyse the bibliometric network graphs. The literature included in this study is collected from various sources. There are two reasons for the decision to use a single database for each analysis rather than combine results from different sources. First, few databases have the required bibliographic data needed for the analysis, second the errors created when converting between formats are not likely to merit the added value. The network and bibliometric analysis are thus run on a sample, rather than all articles. As this sample represents a very large proportion of all articles, the results are considered valid (5).
For the different analyses we choose the databases deemed to have the best bibliometric data available for our unit of analysis. For the co-occurrence analysis, we use data collected from PubMed, as it is the most comprehensive database for COVID-19 related literature, and is updated daily (6). This was confirmed after conducting the search in both Scopus, PubMed and ISI Web of Science. As PubMed does not include reference lists, we use data collected from Scopus for the bibliometric coupling analysis. Scopus was chosen over ISI Web of Science because of more frequent updates and focus on life sciences (6).
The search criteria include the following terms: “COVID-19”, “SARS-CoV-2”, “severe acute respiratory syndrome coronavirus 2”, “2019-nCoV” and “2019 novel coronavirus”. As the spread of COVID-19 was first detected in December 2019, in Wuhan, China, research included in this study is from 2020 and onwards. In Scopus we searched for these terms in TITLE-ABS-KEY, and in PubMed in Title/Abstract (full search terms for each database are provided on the website). To avoid confusion with research on other strains of the Corona virus, this term is not included. The analysis was conducted using the VOSviewer 1.6.14 software (7), which is generally accepted to represent best practice in the science mapping literature (8). The data is cleaned in that singular and plural forms of a term are combined as are synonyms. The search terms are excluded from the analysis, as are generic terms, like the word “study”. The file showing all cleaning is provided on the website.
Analytical strategy
Keyword co-occurrence analysis
The conceptual idea behind keyword co-occurrence analysis (9) is that when a set of words occur in different documents, the concepts behind these words are likely closely related. By algorithmically extracting keywords and quantitatively analyzing the content of a group of documents we can establish how closely related they are. From these results, we can build a conceptual network structure of the research field (10). In our corpus from PubMed, this includes both author-generated keywords and MeSH terms.
We used the keywords to construct a two-dimensional keyword-map, where the layout is based on a framework for mapping and clustering, in the VOSviewer software (7). Keywords were mapped so that keyword relatedness is associated with proximity on the map. The size of the nodes reflects keyword frequency, and the weight of connecting lines indicates in how many articles the keywords co-occur. The keywords are clustered using an approach akin to modularity-based clustering (7): all keywords are analysed and placed in a cluster where they co-occur most frequently, signified by color. The naming of the identified clusters aimed at reflecting its most prevalent themes, using the coding principles of grounded theory (11), including the steps of open and axial coding, in order to identify common topics in the clusters. The process included conducting a cluster analysis with higher resolution of the co-occurrence map, and the application of weighted degree centrality (12) to identify the most prominent terms. The network map gives an overview of the research field, also showing which topics are studied in conjunction to each other, and it can give an indication of there may be knowledge gaps in the research field.
Bibliometric coupling analysis
With the bibliometric coupling analysis we examine documents reference lists to identify shared references. The extent of overlap between reference lists is a measure of the strength of connection between documents (13). A large overlap, when two documents share many references indicate a probability that the documents are on a related topic. Where there is little overlap, it suggests the documents are based on distinct literatures and likely cover different topics. We present a two-dimensional map, created using VOSviewer, where the layout is determined using a unified framework for clustering and mapping (14). The articles are located so that the distance between the nodes represent their relatedness and are grouped in clusters, which indicates a shared theme. The size of the node indicates the number and strength of connections to other articles. The articles that do not have a reference list, or that does not share any references with other articles, are not allocated to a cluster.
To identify important nodes in each network graph, we calculate two centrality measures. Weighted degree centrality (referred to as centrality) which is the sum of links a given node has to other nodes, taking the strength of the link into account. This measure indicates the importance of a node. In the co-occurrence network graph some keywords connect the whole or large parts of the network and represent generally important terms, not specific to any one topic. We identify these as having a high bridging centrality, a metric for how often a node is on the shortest path between any other two nodes (15).