Dengue Prediction Through Machine Learning and Deep Learning: A Scoping Review Protocol.

Background: Dengue is an endemic disease caused by the DENV virus. There are four types of serology for this virus (DENV1, DENV2, DENV3 e DENV4). All of these variations can cause the disease and, once infected with one type, the patient is not immune against other serologies. Due to the particularity of the virus serology, as well as the ease of reproduction of the transmitting mosquito, approximately 4.3 million people suffered from this disease in 2019. Although it is not a new disease, there is still no effective vaccine against the virus. The best form of combat is prevention against mosquito proliferation. In this sense, machine learning and deep learning techniques have been used to predict dengue cases. In this work we show a scope review to clarify how it is possible to predict dengue cases through machine and deep learning. Methods: This scope review will follow the methodology dened in the article “Scoping studies: advancing the methodology”. The methodology consists of six phases. We chose to use only the mandatory ones: 1 Identify the research question, 2 - Identify the relevant studies, 3 - Select the studies, 4 - Map the data and 5 - Compile, summarize and make the report. The main research question is to verify the feasibility of using machine learning and deep learning in the prediction of dengue cases. Derived from this question, the machine learning and deep learning techniques used will be investigated, where the studies are carried out, which data are being used, how the models are validated and which produce better results. The review used electronic databases: Scopus Document Search, IEEE Xplore Digital Library, PubMed, ACM Digital Library, and Web of Science. Results: After completing this study, a technical-scientic opinion was created and the suggested protocol was executed. As a result of the execution, 301 papers were selected and 14 approved. Conclusions: We can prove the effectiveness of using Machine Learning and Deep Learning techniques to predict dengue cases. 16,2020

Results: After completing this study, a technical-scienti c opinion was created and the suggested protocol was executed. As a result of the execution, 301 papers were selected and 14 approved.
Conclusions: We can prove the effectiveness of using Machine Learning and Deep Learning techniques to predict dengue cases.
Systematic Review registrations: Submitted on October 16,2020 Open Science Framework Background Dengue is an endemic disease caused by the DENV virus and transmitted through the Aedes Aegypti mosquito (1). There are four variations of the virus (DENV1, DENV2, DENV3 and DENV4) and all can cause the disease (2). The main symptoms of dengue are: high fever, muscle pain, malaise, lack of appetite and headaches. In some more severe cases, dengue can cause bleeding, di culty breathing and cause the patient death (3).
Although it is not a new disease, there is still no effective vaccine for immunizing the population against all types of viruses. Once infected with any of the four types of the virus, the patient acquires immunity to this variation. However, it remains susceptible to the others. Additionally, in cases of reinfection by another serology, the symptoms are stronger (4). Therefore, efforts to combat dengue are directed against the proliferation of the vector.
In 2019, the World Health Organization (WHO) accounted for approximately 4.2 million cases of dengue worldwide. Previously, this same agency issued an alert classifying dengue as one of the main diseases for the year 2019 (5). In Brazil, due to the circulation of a new type of virus (DENV-2), there was a new outbreak of dengue in 2019 with an increase of 149% of cases in some states (6).
The Aedes Aegypti mosquito reproduces through standing water and nds in countries with tropic climate and high level of rain precipitations, that is an ideal place for its reproduction (7). To combat the disease, government systems invest in public awareness campaigns, asking for the correct disposal of tires and containers in the open, since they can accumulate water and become, in the future, a breeding ground for mosquito proliferation. (8).
Given the global health problem caused by dengue and its limitations in combat, machine learning (ML) and deep learning (DL) techniques emerged as allies in its combat. The idea is to create models for the prediction of dengue cases and outbreaks, thus providing Governments with accurate information and helping managers in decision making.
Using machine learning techniques, the Canadian company Bluedot wrote, in January 2019, an article talking about the possible outbreak of transmission of COVID-19 through the air tra c of China (9). If the recommendations were taken seriously and restrictions were imposed, how many of the 996.342 lives lost by COVID-19(on 09/28/20) could have been saved (10)?
Disease prediction is not an easy job. There are several in uencing and impacting factors when creating models. Here are some examples: climatic economic and social factors, urban mobility (11). Doni et al. (12) conducted a study in India, using deep learning techniques to analyze climate data, such as temperature, precipitation and humidity data, combining them with historical data on dengue. Its model was able to predict dengue cases with an accuracy of 89%. Another example of the use of prediction through ML and DL occurred in Thailand (13). Here, in addition to climate data, research data obtained from Google Trends was used.
When conducting research in bases such as Scopus, more than 107 articles were returned on the theme of prediction combining machine learning and deep learning during the prediction of dengue cases. This fact demonstrates the attention given by science in the search for a prediction model. Therefore, a scope review is justi ed to verify which machine learning or deep learning techniques are being used, where studies are being carried out, how and what data are being used for predictions. Finally, we show which techniques are showing the best results and the reasons.

Methods And Design
Levec at al (14) details a scope review methodology composed of ve mandatory phases: 1 -Identify the research question, 2 -Identify the relevant studies, 3 -Select the studies, 4 -Map the data and 5 -Compile, summarize and make the report. Additionally, there is a sixth phase called consultation. Although, this is optional.
This systematic review will be structured based on the model mentioned above. However, only the mandatory stages will be applied. Moreover, researchers will use the software StArt 1 to assist in the de nition and execution of the research protocol.

Identifying the research question
The main objective of this systematic review is to verify the feasibility of using machine learning and deep learning techniques in predicting cases of dengue disease. To achieve the main response and achieve the goal of the review, the team used strategy de ned by the acronym PICO: Population(P): patients with dengue, Intervention (I) assist health systems in predicting dengue cases, Comparisons (C) compare which ML and DL techniques are used in the prediction, Outcomes (O) the result of the predictions as well as the evaluation of the performance of these. Furthermore, in order to improve the strategy for responding, the team divided the problem into ve parts: techniques, data, approach, validation and outcomes. That done, subdivisions of the main question were created, following the details separated by parts: 1. Techniques: Which machine learning and deep learning techniques are used in predictions?
2. Data: Which country was the study conducted? How was the data collected?
3. Approach: How many years of data were used in the models and which items were considered when creating the models? Example: climatic factors, economics, data from social networks, among others. Identifying relevant studies After de ning the rst research question and its derivations, the next step was to de ne which databases would be relevant for the review. To research the studies, the main electronic databases were used in the areas of Health and Computer Science: Scopus, IEEE Xplore 2 , PubMed (Medline) 3 , ACM Digital Library 4 e Web of Science 5 . Other databases such as Cochrane were tested, but they did not have good indexing of articles for the subject of this review or access to data was limited. Once the bases were de ned, the team started the study on the terms for the formation of the search string.
The terms used in the search string refer to forecasting, machine learning (here, deep learning is understood as a type of ML), and dengue. They are: predict* (refer to terms like predict, prediction, predicted), forecast* (including the words forecast, forecasting, forecasted), machine learning (there is no variation for that term), deep learning (there is no variation for that term) and dengue (references for dengue fever, fever hemorrhagic dengue and, in some countries, only dengue).
The search string was tested on the ve bases selected in rounds. In each round, small adjustments were made to the terms and logical operators. The evaluation of the quality of the string was made by the team analyzing the relevance, quality, relationship of the articles returned with the time of this research. Moreover, seventeen articles were chosen by the team and classi ed as indispensable in the search return.
Therefore, if any of the seventeen articles were not returned, the adjustment in the string was discarded. Finally, after four rounds, the team de ned the following strings by base: Scopus: TITLE-ABS-KEY ((predict* or forecast*) AND ("machine learning" or "deep learning") AND (dengue)); EXC02: Publications that do not meet the inclusion criteria mentioned above.
After de ning the criteria, he began screening articles using the concepts and techniques described in PRISMA (15). Initially, this stage was carried out by two researchers. In order to avoid in uence on the results, each one made the classi cation individually and without knowing the result of the other researcher.

Page 6/9
In this stage of the process, the following steps were performed: 1 -Eliminate duplicate articles; 2 -Quick reading in the abstract and results of the articles. 3 -Apply the inclusion and exclusion criteria previously de ned. In case of exclusions by any criterion, these must have their justi cation registered. After this process had been done, the results obtained were compared. Articles with con icting opinions were screened by a third researcher. This, discussed the points raised by previous researchers, and, nally, deferred the nal opinion in relation to the articles.
Charting the data In the Charting the Data stage, the team de nes the information to be extracted from the articles. There was a meeting with the team to equalize the information to be extracted. As a result of the meeting, the researchers entered into a concession to extract the data: Bibliographic data; Which ML or DL techniques were used; Which country did the study take; How the data was collected; They are o cial bases? How many years of data were used in the samples; Which statistical techniques were used to validate predictions; Which technique performed best; Reviewer's considerations about the study.
The StArt tool allows researchers to include elds to catalog data from articles. As the articles were read, the reviewers, individually, imputed the information in these elds. Finally, all item data, including custom elds, are exported to a le in .xls format. The export was done individually for each reviewer. The separate les needed to be combined into a nal le. In this stage con icts arose. Once again, the existing con icts were resolved through the analysis of a third evaluator.
Collating, summarizing, and reporting the results During the collection, summarization and reporting of results Levec at al (14) suggests division into topics: analysis, reporting e implications. We will follow this suggestion and the following details will be inserted: Analysis: the analysis stage will be divided into two stages. First, the team will do the quantitative analysis. Here, bibliographic data, information about machine and deep learning will be considered, in addition to the data de ned in the topic chart criteria. In the quantitative analysis, the studies will be classi ed and veri ed how the articles answer the research questions. Following, a qualitative analysis will be carried out. In this, the data and methods of ML and DL discussed in the articles will be used combined with the considerations of our team of researchers.
Reporting: aiming at a better structuring, during this phase, a table will be created, separated by columns, listing the strengths and weaknesses of the articles.
Implications for future research: A technical opinion will be created about the models used in the prediction of dengue. The opinion will follow the model suggested by (16) and will serve as a basis for future research on the topic.

Patient and Public Involvement
There was no involvement of the public or patients in this study. With this information, future research will be able to verify what are the best techniques and approaches, as well as what gaps to be lled in this area. Due to the lack of an effective vaccine, mosquito prevention is still the best way to ght the disease. Being able to produce a model with assertive prediction will help governments to ght dengue.

Declarations Ethics and Dissemination
As this is a scope review, ethical approval of this study is not necessary, since primary data will not be collected.

Consent for publication
This review does not contain data from any individual person.