2.1 Search Strategies and Data Acquisition
The data in this study was retrieved from the WoS-CC, PubMed and Embase. In order to improve the precision of our search, we used Thesauri to standardize our searching terms[12]. In the WoS-CC, we set the search strategy as “((invisible OR clear OR removable OR thermoplastic OR non-bracket OR bracketless OR bracket-free) AND (aligner* OR appliance OR correct*)) OR invisalign AND orthodont*” within publication year span from January 1, 2000 to September 30, 2023 and refined by research area: Dentistry oral surgery medicine. A total of 1674 articles were retrieved, and 100 articles strongly related to CAT with the highest citation frequency were selected and exported to Excel with full record, Plain text file with full record and cited references, and EndNote desktop. In PubMed, the search terms were: “Orthodontic Appliances, Removable”[Mesh] AND publication date (2000/1/1-2023/9/30), and 2243 articles were retrieved. Using Emtree Terms at the same way, 540 articles were retrieved from Embase. Data on publication volume and article type were recorded in Excel.
Research data of public concern was obtained from Google Trends[12]. Google Trends is a publicly accessible data source that gives details on the relative search volume (RSV) of queried terms[13]. We used “invisalign” as our search keyword because of its highest frequency of search terms about CAT. Then, based on the ranking of related inquiries from global searches, we selected “invisalign doctor”, “invisalign cost”, “smile”, “invisalign how long”, “invisalign near me” and “retainer” as our research object. The weekly number of queries for these six terms (normalized for processing and comparison, ranging from 0 to 100) from December 2018 to mid-December 2023 was collected.
With orthodox learning and rich experience in using the databases, two researchers examined the titles and abstracts, screening out the 100 articles independently. The interrater agreement in the selection process was measured using Cohen's kappa (k) coefficient. The calculated k coefficient was 1.0, indicating complete agreement[14]. If there were any disagreements regarding the inclusion of studies, they would be resolved through discussion. The hot trend curves of the six terms of public concern were also independently produced by two researchers and finally agreed upon.
2.2 Bibliometric analysis and Prophet algorithm prediction
The two researchers individually utilized Microsoft Excel to organize data and create basic charts, as well as VOSviewer and CiteSpace to visualize the data, before unifying the most appropriate one. Figure 1 depicts the process for bibliometric analysis.
Using Microsoft Excel software, researchers summarized the information of each article into Table S1(Supplementary file), including article title, authors, journal, publication year, article type, times cited and addresses. Besides, the exported documents were quantitatively analyzed for journals, annual publication volume, and article types, which were displayed in tables and figures. In the meantime, WoS-CC was used to obtain the 2022 impact factor (IF), 5-JIF, and Category Quartile of the top 10 journals and the H-index of the top 10 most active authors, which were recorded in the corresponding tables.
VOSviewer, a software created by Leiden University, is designed to generate, display, and investigate maps using online data[15]. The plain text file was imported into VOSviewer (Version 1.6.19) software to generate visualized maps of co-cited references, keywords, authors, and countries. Firstly, co-cited references analysis was carried out. The minimum number of citations of a cited reference was set to 14, and the top 10 co-cited references were selected and displayed in density visualization, whose data was exported into Excel. Then researchers conducted citation analysis for countries, institutions and authors, exporting the top ten most active ones to Excel respectively. After that, co-occurrence analysis for all keywords of the 100 most cited articles was demonstrated in network visualization. The number of documents in which they occur together determines the relatedness of the items[16]. Similarly, co-authorship analysis for authors was displayed in network visualization, and analysis for countries was shown in overlay visualization.
CiteSpace, developed by Professor Chao Mei Chen, serves as a bibliometric analysis and visualization tool that examines various aspects such as cooperation, keywords, internal structure, research trends, and dynamics[17]. First, CiteSpace (Version 6.1.R6) was used to create an institution co-occurrence map after importing the plain text file, where various nodes and links indicate weight and correlation. Documents published from 2001 to 2021 were chosen with a time slice of 1 year and node type as institution. A minimum of 3 citations was set as a threshold and others followed the default. Second, change the node type as keyword and then set γ to 0.4 in the burstness section of the control panel, nine burst items were found. Research fronts and emerging trends can be identified by analyzing keywords with the strongest citation bursts.
The Prophet is an open-source time series algorithm from Facebook, which is capable of processing holiday data and forecasting time series data changes by week, month, and year[18]. We selected it to forecast potential search volume for the following two years because of its well-fitting impact on periodic time data.
All statistics and data analysis were programmed in Python; past data was read, future data was predicted using Prophet, and data visualization was created using Matplot.
Prophet divides time series into multiple components:
$$y\left(t\right)=g\left(t\right)+s\left(t\right)+h\left(t\right)+{ϵ}_{t}$$
where \(\text{g}\left(\text{t}\right)\) is the trend term, which represents the non-periodic trend of the time series; \(\text{s}\left(\text{t}\right)\) is the periodic term, representing the change of the time series in a certain period of the overall time; \(\text{h}\left(\text{t}\right)\) represents the impact of non-fixed period holidays in the time series on the predicted values. The impact of holiday factors was not included in our data. The error term \({\text{ϵ}}_{\text{t}}\) indicates fluctuations that follow a Gaussian distribution and are not picked up by the model. The predicted value is obtained by adding the elements that the Prophet algorithm fits with the available data.
We employed the trend term in this data model, which was based on Prophet’s piecewise linear functions. The position of the transition point in Prophet must be determined in order to accommodate changes in the overall trend[18]. Using the application, we automatically discovered change points and designated the proper locations as change points. In order to show the growth rate change on timestamp \({s}_{j}\) if the original growth rate is \(k\), we first assumed that the growth rate change on timestamp \({s}_{j}\) is \({\delta }_{j}\). We next created an indicator function:
$${a}_{j}\left(t\right)=\left\{\begin{array}{c}1, if t \ge {s}_{j}\\ 0, others\end{array}\right.$$
where \({a}_{j}\left(t\right)\) was used to determine the change in growth rate.
As a result, \(k+{a}^{T}\delta\) was the growth rate on timestamp t.
We computed the following to deal with the boundary values of line segments:
$${{\gamma }}_{\text{j}}=({\text{s}}_{\text{j}}-\text{m}-{\sum }_{\text{l}<\text{j}}{{\gamma }}_{\text{j}})\bullet (1-\frac{\text{k}+{\sum }_{\text{l}<\text{j}}{{\gamma }}_{\text{j}}}{\text{k}+{\sum }_{\text{l}\le \text{j}}{{\gamma }}_{\text{j}}})$$
In summary, the expression for \(\text{g}\left(\text{t}\right)\) was obtained as follows:
$$g\left(t\right)=(k + a(t\left)\delta \right)t+(m+{a\left(t\right)}^{T}\gamma )$$
Where \(\text{k}\)represents the initial growth rate, and m represents the offset parameter, \({{a\left(t\right)=(a}_{1}\left(t\right),\bullet \bullet \bullet {,a}_{S}\left(t\right))}^{T}\), \(\delta =({\delta }_{1},{\bullet \bullet \bullet ,{\delta }_{S})}^{T}\), \(\gamma =({\gamma }_{1},{\bullet \bullet \bullet ,{\gamma }_{S})}^{T}\).
\(s\left(t\right)\) is the periodic term, also known as the seasonal term, which describes the periodic changes in a model due to changes in the time period[19]. This model uses Fourier series to simulate periodic changes in time series:
$$s\left(t\right)={\sum }_{n=1}^{N}{(a}_{n}cos\left(\frac{2\pi nt}{T}\right)+{b}_{n}sin\left(\frac{2\pi nt}{T}\right))$$
where N is the time coefficient associated with T and T is the time series’ period.
To eliminate the interference of some noisy data and prevent over-fitting caused by small changes in the curve, we specified the flexibility of the Change prior scale to control the trend and choose appropriate Change Points using the formulas obtained from the above parameters for model fitting and prediction. For the keyword “smile”, we set the ChangePointRange parameter to 0.65, the default value for all other keywords to 0.8, and the other parameters to default. The original data is represented by the black scatter points in the image, the scatter fitted and predicted curves by the blue curve, the interval of uncertainty by the gray area, the potential trend change points by the vertical red dashed line, and the overall trend by the red solid line of the time series.
For every keyword, we have acquired the RSV curve data and fitted appropriate model curves with associated forecasts.