Levec at al (14) details a scope review methodology composed of five mandatory phases: 1 - Identify the research question, 2 - Identify the relevant studies, 3 - Select the studies, 4 - Map the data and 5 - Compile, summarize and make the report. Additionally, there is a sixth phase called consultation. Although, this is optional.
This systematic review will be structured based on the model mentioned above. However, only the mandatory stages will be applied. Moreover, researchers will use the software StArt1 to assist in the definition and execution of the research protocol.
Identifying the research question
The main objective of this systematic review is to verify the feasibility of using machine learning and deep learning techniques in predicting cases of dengue disease. To achieve the main response and achieve the goal of the review, the team used strategy defined by the acronym PICO: Population(P): patients with dengue, Intervention (I) assist health systems in predicting dengue cases, Comparisons (C) compare which ML and DL techniques are used in the prediction, Outcomes (O) the result of the predictions as well as the evaluation of the performance of these. Furthermore, in order to improve the strategy for responding, the team divided the problem into five parts: techniques, data, approach, validation and outcomes. That done, subdivisions of the main question were created, following the details separated by parts:
1. Techniques: Which machine learning and deep learning techniques are used in predictions?
2. Data: Which country was the study conducted? How was the data collected?
3. Approach: How many years of data were used in the models and which items were considered when creating the models? Example: climatic factors, economics, data from social networks, among others.
4. Validation: How were the models validated? What statistical techniques did the study use to evaluate the performance of these?
5. Outcomes: Which technique or combination of techniques achieved the best results and the reasons?
Identifying relevant studies
After defining the first research question and its derivations, the next step was to define which databases would be relevant for the review. To research the studies, the main electronic databases were used in the areas of Health and Computer Science: Scopus, IEEE Xplore2, PubMed (Medline)3, ACM Digital Library4 e Web of Science5. Other databases such as Cochrane were tested, but they did not have good indexing of articles for the subject of this review or access to data was limited. Once the bases were defined, the team started the study on the terms for the formation of the search string.
The terms used in the search string refer to forecasting, machine learning (here, deep learning is understood as a type of ML), and dengue. They are: predict* (refer to terms like predict, prediction, predicted), forecast* (including the words forecast, forecasting, forecasted), machine learning (there is no variation for that term), deep learning (there is no variation for that term) and dengue (references for dengue fever, fever hemorrhagic dengue and, in some countries, only dengue).
The search string was tested on the five bases selected in rounds. In each round, small adjustments were made to the terms and logical operators. The evaluation of the quality of the string was made by the team analyzing the relevance, quality, relationship of the articles returned with the time of this research. Moreover, seventeen articles were chosen by the team and classified as indispensable in the search return. Therefore, if any of the seventeen articles were not returned, the adjustment in the string was discarded. Finally, after four rounds, the team defined the following strings by base:
-
Scopus: TITLE-ABS-KEY ((predict* or forecast*) AND ("machine learning" or "deep learning") AND (dengue));
-
IEEE: ((("Full Text & Metadata":predict* or forecast*) AND "Full Text & Metadata":machine learning or deep learning) AND "Full Text & Metadata":dengue);
-
PubMed: All Fields ((predict* or forecast*) AND ("machine learning" or "deep learning")) AND (dengue);
-
ACM: [All: predict or forecast] AND [All: machine learning or deep learning] AND [All: dengue] AND [All: predict* or forecast*] AND [All: machine learning or deep learning] AND [All: dengue];
-
Web of Science: TOPIC: (predict* or forecast*) AND TOPIC: (machine learning or deep learning) AND TOPIC: (dengue)
Study selection
The goal of this review is to understand whether it is possible to make predictions of dengue cases based on machine learning and deep learning techniques. In addition, surveys which are the main approaches and which techniques are producing the best results.
Machine learning and deep learning are current themes, constantly evolving and widely researched in medicine and computer science. Primarily, studies will be selected to predict dengue cases performed using ML and DL techniques. In order to have a better direction with the research question, the team defined inclusion (INC) and exclusion (EXC) criteria. Are they:
-
INC01 - The study uses machine learning techniques?
-
INC02 - The study uses deep learning techniques;
-
INC03 - The study was statistically validated?
-
INC04 – There is a comparison and use of more than one ML or DL technique or model in the study?
-
INC05 – The study contains prediction of Dengue cases?
-
INC06 – The paper must be written in English or Portuguese;
-
INC07 – The study contains risk of bias (outcome or study level)?
-
EXC01 - Duplicate publications. Some databases like Scopus index articles, are shown in other databases. With that, some articles may come duplicated;
-
EXC02: Publications that do not meet the inclusion criteria mentioned above.
After defining the criteria, he began screening articles using the concepts and techniques described in PRISMA (15). Initially, this stage was carried out by two researchers. In order to avoid influence on the results, each one made the classification individually and without knowing the result of the other researcher.
In this stage of the process, the following steps were performed: 1 - Eliminate duplicate articles; 2 - Quick reading in the abstract and results of the articles. 3 – Apply the inclusion and exclusion criteria previously defined. In case of exclusions by any criterion, these must have their justification registered. After this process had been done, the results obtained were compared. Articles with conflicting opinions were screened by a third researcher. This, discussed the points raised by previous researchers, and, finally, deferred the final opinion in relation to the articles.
Charting the data
In the Charting the Data stage, the team defines the information to be extracted from the articles. There was a meeting with the team to equalize the information to be extracted. As a result of the meeting, the researchers entered into a concession to extract the data:
-
Bibliographic data;
-
Which ML or DL techniques were used;
-
Which country did the study take;
-
How the data was collected; They are official bases?
-
How many years of data were used in the samples;
-
Which statistical techniques were used to validate predictions;
-
Which technique performed best;
-
Reviewer's considerations about the study.
The StArt tool allows researchers to include fields to catalog data from articles. As the articles were read, the reviewers, individually, imputed the information in these fields. Finally, all item data, including custom fields, are exported to a file in .xls format. The export was done individually for each reviewer. The separate files needed to be combined into a final file. In this stage conflicts arose. Once again, the existing conflicts were resolved through the analysis of a third evaluator.
Collating, summarizing, and reporting the results
During the collection, summarization and reporting of results Levec at al (14) suggests division into topics: analysis, reporting e implications. We will follow this suggestion and the following details will be inserted:
-
Analysis: the analysis stage will be divided into two stages. First, the team will do the quantitative analysis. Here, bibliographic data, information about machine and deep learning will be considered, in addition to the data defined in the topic chart criteria. In the quantitative analysis, the studies will be classified and verified how the articles answer the research questions. Following, a qualitative analysis will be carried out. In this, the data and methods of ML and DL discussed in the articles will be used combined with the considerations of our team of researchers.
-
Reporting: aiming at a better structuring, during this phase, a table will be created, separated by columns, listing the strengths and weaknesses of the articles.
-
Implications for future research: A technical opinion will be created about the models used in the prediction of dengue. The opinion will follow the model suggested by (16) and will serve as a basis for future research on the topic.
Patient and Public Involvement
There was no involvement of the public or patients in this study.
1http://lapes.dc.ufscar.br/tools/start_tool
2https://ieeexplore.ieee.org/Xplore/home.jsp
3https://www.ncbi.nlm.nih.gov/pubmed/
4https://dl.acm.org/
5www.webofknowledge.com