Survey On The application of Deep Learning in Internet of Things (IoT)


 The Internet of Things (IoT) is a network of physical instruments, software, sensors that all are connected to the Internet. The IoT produces massive data, where, this enormous volume of data allows the use of deep learning algorithms (DLAs). Recently, the increase of the large body of data and their availability has been one of the main reasons for paying attention to this issue. Further, the recent upgrade of the hardware boosting the computational power has resulted in the utilize of deep learning alongside the IoT. Therefore, the purpose of the present research is to review the relevant conference and journal articles in IoT and deep learning from 2012 to July 2019. To review the publications, a composition of Systematic Mapping and systematic literature review has been employed for creating a survey paper. Accordingly, some questions have been raised; to answer which, 32 articles have been investigated. The papers have been categorized into four sections including a focus on data, network, computing environment, application with each being examined, and analyzed. This article would be beneficial for researchers who want to investigate the field of deep learning and IoT.


Introduction
The devices such as mobile, transportation, and home facilities can be utilized as a data-processing device and connected to an IoT network. This network that is a new phenomenon in recent technology can remotely control these devices [1,2,3]. Many IoT applications can be found in a variety of domains. The main component in many of these programs is the intelligent learning method for prediction, pattern recognition, data mining, or data analysis. Out of the numerous strategies of machine learning, researchers have observed the extensive use of deep learning in numerous IoT programs in later a long time. The technologies of deep learning as well as IoT have been among the 3 main technologies announced in 2017 at the Gartner/ITxpo 2016 symposium. This extensive advertising in deep learning highlights the reality that traditional machine learning methods could not respond to the new analytical required of the IoT. Instead, the IoT system requires arti cial intelligence techniques and data analytic methods suited to the hierarchical process of IoT data generation and data management [4,5,6].
Deep learning refers to the improvement of machine learning models utilized to learn a hierarchical representation of data. A general term for a set of neural networks with multi-layer architecture is deep neural networks which present how neural networks with many layers can successfully create the required representational structures for deep learning. Learning algorithms can be used as supervised or unsupervised to adjust the weights of these networks. Deep learning methods transform the input data hierarchically and through a multi-layer model into a set of features, which are nally handed over to a classi er for classi cation [7].
In applications where there are so many situations, a deep model may be utilized to estimate the rates of operation (for example, how to operate well in a certain situation). Those systems that combine deep learning with reinforcement have been considered to be in their early stages, but recently in some applications (such as video games), competitive results have been developed. Further, it is expected that supervised and unsupervised learning approaches improve in the future [8]. IoT programs can advantage from the decision-making processes for learning aim. Such as, within the case of residential services, it is possible to consider the area estimation as a decision-making method where a software agent decides the nearest point to a particular purpose or its precise area.
In this research, the publications in the scope of the IoT and its combination with deep learning would be systematically studied. As far as we know, so far any systematic review has been not conducted on this subject, so this article would be bene cial for relevant researchers. In the rst part of the article, some questions are raised and according to the proposed methods, the available articles are compared with each other based on the four categories. In the end, questions of the study would be answered.
This research is arranged in four sections. Section 2 presents methodology and questions. Section 3 considers the comparison of the major investigations in the article and answers to the questions appointed in the article are given in Section 4. Ultimately, Section 5 concludes the research.
The criteria applied in this research have been previously used in [4]. We study the journals and conference papers published in the English language from 2012 to July 2019 in our systematic review. The selection and non-selection criteria for articles are described in Table 1. Table. 1. Selection Criteria and Non-Selection Criteria [4] Selection Criteria Articles whose focus is "deep learning "and ("IoT" or "internet of things") Articles that have been published between 2012 and July 2019.

Non-selection Criteria
Books and technical reports have been excluded.
Articles with no available complete text have been rejected.
The non-English articles.
The articles with no relation to the research questions.
The identical articles.

2-4. Search Terms
We used the terms and "deep learning", "Internet of Things" or "IoT" to search the articles in the mentioned databases. The same term was used in the mentioned databases. we only searched the title of articles assuming that novelty is generally stated in the title of databases.

2-5. Review steps
The articles have been selected according to the following steps: The search terms are used to search the ve databases mentioned in Section 2-2.
The non-selection criteria eliminate some of the articles.
After reading the title and the abstract of articles, irrelevant ones are ignored.
By completely reading the articles, the most signi cant ones are selected.

Primary Research On Classi cation Plans
In this article, to study and analyze articles in the two elds of IoT and deep learning, we have categorized them based on the main idea of the articles. According to this view, the articles are classi ed into four main categories. Figure 2 reveals these categories. The articles in the 'Focus on data' category deal with IoT data processing to get ready for being used in deep models. The articles in this category sometimes review data representation in a new way or extract the appropriate features or reduce data dimensions. In the 'Network' category, the reviewed articles are distributed into three groups; in the rst one, the articles are focused on the utilized network or a change in the IoT network. In this category, we put these articles in the branch of the 'Network Technology'. Deep learning needs many resources, including processor, battery power, and memory for the training itself; as a result, it cannot be appropriate for IoT devices to have resource constraints. Thus, in some articles, they presented methods for scheduling tasks on the IoT and adjusting the computational load to improve the consumption of resources. These articles are in the 'Resource Management' section. Guaranteeing privacy and security of information is one of the most important considerations in numerous IoT applications and the reason is that the IoT data are sent to analyze via the Internet and are thus visible around the world. Since many applications use anonymization, hacking and re-uni cation of these strategies as unnamed data will be possible. Further, biased attacks such as entering incorrect data or testing inputs by competitors threaten the Deep Learning (DL) training models; several e cient or ine cient conditions (such as availability, reliability, validity, certainty, etc.) may be at risk with these attacks. We classi ed articles that focus on the privacy and security of information into the 'Security' category.
As shown in Figure 2, the 'Computing Environment' category includes cloud, fog, and edge, along with their combined application, as well as big data analysis. Since the IoT devices have computational constraints and cannot quickly handle the deep learning model computations, some articles have implemented their computations in a different context so that they can apply their proposed model. Articles that use deep learning with IoT to present a new application have also been classi ed and reviewed in the 'Application' category.
By searching the mentioned databases, 151 articles were found on the Internet of Things and deep learning which met the criteria stated in Table 1. After selecting and removing articles, we reached 32 articles. These 32 selected articles included 7 conference articles and 25 journal articles, and from the latter, 19 articles had Q1 qualitative level, 3 articles had Q2 qualitative level, and 3 articles, despite their scienti c value, had no recorded qualitative status on the site https://www.scimagojr.com/journalrank.php.  We categorized and analyzed the selected articles in four categories: 'Focus on Data Management, Network, Computing Environment, and Application'. Figure 3 illustrates the number of publications and their percentage in comparison to the articles reviewed. In the following sections, we will review the articles in each category.

3-1. Primary Studies of 'Data Management'
In this section, the primary studies on the IoT and deep learning are investigated whereby it is determined that 25% of primary studies focused on Data Management.
Wang and Zhang [18] proposed a tensor DL model for heterogeneous data fusion in IoT. In this article, the authors used the tensor space for simulating the strongly non-linear distribution of IoT big data. They introduced tensor distance and high-order backpropagation for extending the data from linear to multilinear space. Finally, the proposed algorithm was compared to the stacked autoencoder and multimodal deep learning in STL-10 and CUAVE datasets. The authors detailed improved accuracy and data fusion compared to the two mentioned methods.
In 2018, Liang [19] proposed a fast and smart data processing scheme to achieve a quicker calculation in deep learning for realtime applications. In this study, the pre-processing of data was regarded as the main objective. Generally, data pre-processing is usually done in two categories: reduction of the data to subsets with their main features, and data transformation for eliminating some of the main features. In this study, in the preprocessing phase, the combination of both categories has been applied to take all their advantages and consequently preserve the physical properties of original data which is used as a linear in the selected subset. The offered method was evaluated through two large-scale dataset scenarios and big data. The authors proposed the SVD-QR in a large-scale data scenario for selecting sub-datasets. The SVD is applied to sort out individual values and their corresponding single vector while also determining the size of the dataset by single values. On the other hand, the QR is used for selecting the data sample as the deep learning input. In the big data scenario, Limited Memory Subspace Optimization is applied for SVD (LMSVD). This method uses large matrices to calculate outstanding single values based on the optimization of Krylov's subspace, and then they are selected by applying QR data. The proposed method was simulated through handwriting recognition which is used widely in most IoT applications. After data preparation in the pre-processing phase, data are sent to the deep feedforward neural network based on the two mentioned scenarios. The outcomes demonstrate that the method is a powerful technique in deep learning and also, the SVD_QR method effectively reduces the input data and energy consumption.
Bu et al. [20] proposed a multi-projection deep computation model for the smart data in IoT. They utilized the Multi-Projection Deep Computation Model (MPDCM) and generalized the DPDCM through replacing all hidden layers of the deep computation model with a multi-projection layer. Firstly, the MPDCM mapped each multi-modal object to various sub-spaces for demonstrating hidden characteristics in various subspaces. Then, the Multi-Projection autoencoder (MPTSE) learned interactive intrinsic properties from obtaining correlation by mapping the sub-space to the output. They designed a similar option of the MPDCM for training the parameters of the MPDCM based back-propagation and gradient descent. Then, they examined their idea for classi cation accuracy in Animal-20 and NUS-WIDE-14 datasets and compared them with DPDCM. The results indicated that MPDCM, with increasing numbers of sub-spaces, can accomplish higher categorization accuracy than the DPDCM. This re ects the ability for learning big data features. Although the algorithm uses more subspaces than the DPDCM algorithm does, the computational and the time complexity are almost the same.
Li et al. [21] suggested that a Deep Convolutional Computation Model (DCCM) according to CNN can be applied to heterogeneous industrial big data. The DCCM extends the CNN from the vector space to the tensor space. This tensor-based model can show the hidden relationship over various modalities of the big data and represents heterogeneous things. The tensor can represent the structure of heterogeneous data while maintaining the data raw structures. These attributes allow us to investigate complementarity and mutuality across various modalities. In addition, the tensor may prevent several issues in the vector space, including singularity and dimension disaster. Therefore, they can be employed in various utilizations like feature extraction, pattern recognition, and data fusion. The other advantage of using the tensor space which is mentioned in this article is that it reduces the number of the free variables to avoid over-tting and also shortens the training time. They introduced a high-order back-propagation algorithm for training the parameters of DCCM in the high-order space. The experimentations outlined in this article on the three data sets of CUAVE, SNAE2, and STL-10 indicated that duration of training of the deep convolutional computation model was longer than that of CNN due to using more weights while it is less timeconsuming than the DCM method because by taking a pooling strategy and local receptive strategy in the tensor space, the number of the weights can be reduced e ciently.
Mohammadi et al. [8] utilized semi-supervised learning methods for solving the problem of the lack of labeled data on IoT. They provided a deep reinforcement learning (DRL) algorithm according to the Bluetooth Low Energy (BLE) indoor intelligent location in a smart city. The experimental ndings of this new algorithm suggested the e ciency of the semi-supervised DLM compared to the supervised deep learning model. This model uses the Variational Autoencoder (VAE) as the inference engine to generalize optimal policies. Moreover, the proposed model explores the extension of reinforcement deep learning to the semisupervised and provides a general framework for all types of IoT programs.
Yao et al. proposed four fundamental questions concerning the interaction between human beings and physical objects with the potential of deep learning. They examined the answers to questions as follows: which deep neural network structure can effectively be used to process and combine sensor data in different utilizations?, How to decline the resource use of DLMs in order to fully use them on the resource-constrained IoT instruments?, How can con dence measures be calculated in DL predictions?; and How can one reduce the need for labeled data?" [22].
The DeepSense framework has been introduced as an effective structure for processing and combining sensor data across different applications with minor changes. This framework contains a recurrent neural network (RNN) and convolutional neural network (CNN) and divides the sensor data into time intervals in order to process the time series data. DeepSense uses a convolutional network on sensors to encode local features and e ciently combine sensor data and RNN to extract time patterns. This framework can be used for both categorization and estimation issues. In this article, Deepsense has been used to identify heterogeneous human activity recognition (HHAR) as well as user identi cation with the biometric motion analysis (User ID). By comparing Deepsense with other deep learning designs, they showed that this framework is more effective than the methods outlined in this article. Yao et al. also introduced the DeepIoT Compact Framework. The purpose of this framework is to discover the optimum dropout for the hidden element in the respective neural network. The evaluation of the DeepIoT compression algorithm suggested that it can decrease the size of the network, energy consumption, and runtime while preserving accuracy. A comparison of this framework with other methods of deep learning has also been outlined in this research. Further, a brief introduction has been presented in this article on the Well-Calibrated algorithm, which is an uncertain estimation algorithm for MLP called RDeepSense. RDeepSense uses a new loss function called tunable that is the weighted sum of the negative log-likelihood as well as mean squared error. They showed that this method provides high-quality uncertainty estimation [22].
In another article, Yao et al.'s study introduced a semi-supervised framework on IoT [23]. The proposed framework (SenseGAN) consists of three sections: a generator, a classi er, and a discriminator. The generator section generates data similar to sensed data. Then, the classi er section labels these data such that the discriminator could not make a distinction between the actual and the simulated data. This process is repeated until the classi er is trained well. The proposed framework used a convolutional structure and Deepsense for the classi er. The authors evaluated this framework on three datasets as user identi cation with biometric motion analysis (User ID), Wi-Fi signal-based gesture recognition (Wisture), and HHAR. The evaluations outlined in the article revealed that this framework is suitable for labeled and unlabeled data and can promote the classi er predictability without any timely operation and energy consumption.
Moreover, Kheli et al.'s [24] study proposed numerous methods according to DLMs for IoT utilization. They introduced Edge of Information-Centric IoT to decrease latency time-critical applications. The authors proposed a confusion method to join ICN, IoT, and Edge computation. They applied RNN to process online data to save the history of the exchanged data and future predictions. A key advantage of this new technique has been considered to be a reduction in the volume of data and computation and processing assignments of diverse data in a real-time scheme.
The primary studies conducted on DLMs & IoT which focused on the data are surveyed and analyzed. Our considerations are outlined in Tables 3 and 4.  Applying tensor spaces -Simulating the nonlinear distribution of big data using Tensor -Improving heterogeneous data composition -Better accuracy in detection than CDL and MML for big data In the data integration, all data is considered and additional data is not deleted [19], 2018 Preprocessing data by using SVD-QR and LMSVD-QR -Increasing computational speed -Reducing energy consumption To increase the speed of computation just focusing on pre-processing [20], 2019 Applying Multi projection deep computation model -High accuracy -Time complexity and computational is the same as DPDCM The classi cation results are unstable due to the effect of primary parameters [21], 2018 Applying tensor space -Avoiding singularity and dimension disaster.
-Achieving more acceptable categorization than the DCM and MDL without a high training cost.
-Reducing the number of weights in the tensor space.
The suggested model contains rather parameters in the high-order tensor space.
[8], 2018 Applying semi-supervised deep reinforcement learning for dissolving the issue of the lack of labeled data -The model learned the optimal action policies resulting in a better estimation of the target locations.
Disadvantages are not mentioned [22], 2018 Introducing a new framework for the purposes mentioned in the article -Reducing the running time -Reducing the need for labeled data -Increasing the accuracy Further investigations are necessary to better con rm the applicability of the results.
[23], 2018 Introducing SenseGAN framework -Training model with 10% of labeled data It does not optimize for Multi-sensor data. It only takes into account the category and does not pay attention to the regression. The learning process has a tricky computation [24], 2019 Combining Edge computing and ICN and IoT -Reducing latency time-critical application -Reducing the volume of data -Computing and processing in real-time -Improving the reliability and function/performance. -Mitigating the deployment complexity, and enhancing the exibility of the network communication.
The training time is high Table 4 compares the articles in the 'Focus on Data' domain. In this table, we compare articles based on the year of their publication, experimental type, applied learning model, models compared, dataset, comparative criteria, tools, and language. The experimental type speci es the proposed scheme's type. This feature determines the designer type as numerical analysis, implementation, simulation, design, and mathematical proof. The Applied deep learning model feature characterizes the model used in the idea expressed in the article. The Compare Model feature shows the models that were compared with the articles while the Dataset determines the database used by the authors. The Comparison Criterion shows the criteria used by authors to compare the models while the Tool and Language describe the tools and programming language used, respectively. -Run time -Energy consumption -Accuracy -F1 score Not mentioned in the article [24] , 2019

3-3. Primary Studies of Network
In this section, the primary studies on the IoT and deep learning are investigated whereby it is determined that 31% of primary studies have focused on the Network.
The framework based on Software-De ned Networks (SDN) was presented in [25]. The proposed architecture was scalable and exible as well as secure for IoT. The architecture included a layer for IoT and an SDN layer. The SDN layer consisted of controlling and back warding layers, which were based on an intrusion detection system which is a hardware and software system designed for monitoring tra c attacks. Therefore, the authors used Restricted Boltzmann Machines (RBM) for intrusion detection. This network included two steps including forward and backward. According to the forward step, the hidden nodes are a function of the input, weights, biases, and active or passive activity function with the decision at the beginning being stochastic. On the other hand, this step output is a probability vector. In the backward step, samples of the output will be selected of which input will be mad. They used the KDD99 dataset for detecting four types of attacks and intrusion. They claimed that the accuracy of intrusion detection was far higher than that of other methods, and improved by about 9%.
In another article [26], for managing the industrial IoT, SDN was used dynamically and the Software-De ned Industrial Internet of Things (SDIIoT) was introduced. In SDIIoT, a large amount of data and ow were created through the industrial instruments, wherein there is a distributed, but reasonably focused, physical controller. However, one of the most di cult issues is how to achieve an agreement between multiple controls in complex industrial environments. In this article, an agreement protocol on Block Cycle (BC) is proposed. For Distributed SDIIoT, a BC is capable of acting as a trusted and out-of-band the third party for coordinating between different SDN controls in a secure, reliable, and traceable way. The authors of this article utilized permissioned BC because of its low costs, less delay, and being low band-intensive. They considered the change in sight, the choice of access, and the allocation of computing resources as a joint optimization problem. Also, with Marco's decision making, they proposed a new method of dueling deep Q-learning. The results of the simulation showed the convergence and the e ciency of this new technique. Finally, the authors also argued that the procedure of measuring the trustworthiness of nodes can be considered as a matter of great importance for future studies.
McDermott et. al [27] provided a solution to identify botnet activities on consumer and network IoT devices. Their proposed method involved the development of the Bi-directional Long Short-Term Memory based Recurrent Neural Network (BLSTM-RNN). To identify the text and convert attack packets to the correct tokenized format, embed words were used. By examining the accuracy and error, a comparison was made between this new method and the LSTM-RNN and then identi ed the 4 attack vectors by Mirai botnet. The authors created a dataset containing four attack vectors for the Mirai botnet. Then, the researchers tested and evaluated it for 4 attack vectors including UDP, Mirai, ack, and DNS. The new method worked properly for UDP, DNS, and Mira attack vectors, and had respectively 98%, 98%, and 99%, coe cients of reliability. However, it did not work completely for the ack attack vector as it requires more training data.
In another article [28], a wireless device identi cation platform using deep learning techniques was introduced to provide security. Radiofrequency Fingerprint (RF) as one of the physical layer authentication methods can be employed for detecting licensed (allowed) wireless devices. In fact, DL is an acceptable way to achieve the features of various RF instruments by learning their RF data. This article focused on the ZigBee device (IEEE802.15.4) in wireless sensor networks, where ZigBee can use multiple network topologies such as stars, trees, and mesh to transfer data from the source to the base station or other peer-to-peer nodes. RF signals (IQ) are collected by the USRP device from multiple ZigBee devices, for training. These signals are then collected from devices that are already registered on the network and are listed. Next, using this labeled data, the proposed DL-based model is constructed. Upon the training, it is possible to apply the model to detect devices on the network. This new model has been considered to be transparent and passive to the RF devices and hence it is not necessary to install additional software/hardware on the wireless tools. In this article, six different types of ZigBee devices were considered. All the six ZigBee devices were con gured for transferring from ve distinctive SNR levels. The data collected from these six devices were 300 gigabytes and suitable for training a deep learning model. They investigated three models of deep learning improvement, namely DNN, CNN, and LSTM. The bene t of DL is the ability for automatic extraction of the features. The classi cation results could be utilized for intrusion detection, and a warning can be issued for an unregistered attacker's security breach.
In an article [29], the authors introduced an Internet-based system for deep learning for detecting anomalies in the Internet Industry Control System (IICS). The proposed method consisted of two sections; supervised and unsupervised. The unsupervised section aims at providing the initial values of the supervised section. AL-Hawawreh et al. used Deep Auto-Encoder (DAE) to provide the initial weights of the model parameters in the unsupervised section. Auto-Encoder uses input data to create the same data in the output and attempts for minimizing the errors generated by the data. Then, the trained model's parameters will constitute the starting point, the initial values of weights as well as biases for the Deep Feed Forward Neural Network (DFFNN). They used the two data sets UNSW-NB15 and NSL-KDD for evaluating this new model and compared them with the other methods. They found that their method outperformed other methods because of its dimension reduction and its automatic extraction of features. Further, it was easy to detect normal behaviors and attacks, as the model was initially trained by a normalized dataset. The extra training process performed on the model also protected the system against complex attacks.
Ayadi and et Al. [30] used deep learning for routers in Named Data Networking (NDN) whereby the router could intelligently forward the packet. In the forwarding strategy, the responsible component for selection is the other hop according to the cost of the route, metrics of forwarding, and local policies. The application of the neural network is a way to predict the drop rate in the network according to the tra c prediction. The authors considered overload probability prediction in all links as one of the signals for minimizing the drop rate according to the new network status thereby increasing the output of the forwarded one. They used a linear function on the output layer and a two-layer feed-forward network with a sigmoid function on the hidden layer to design the DNN model. Then, the researchers applied the backpropagation algorithm for training the DNN model. A data set of static information for each router was made up of tra cs in each link. They used a dataset available in the Dongo et al.'s project to evaluate their model and achieved 99.23% accuracy with four hidden neurons and over 70 epochs without the occurrence of over-tting. Also, Ayadi et Al. assuming the support of IoT devices from the NDN, reviewed the proposed method for the NDN-based audio conference.
In [31], the authors studied a large volume of information collected in IoT and balanced the load on the network. They introduced an agent called Load Bot for effective load balancing in the domain of IoT. This agent measures the load factor of the network and analyzes the con guration of the structure for enormous volume of data and enormous volume of the network load. Further, by applying the learning methods of deep beliefs, it can obtain effective load balances. They also introduced another factor called Balance bot based on the deep Q-learning algorithms to predict neural load. They created a grid-structured map using the deep-belief network by the RBM. Enhancing learning in deep learning is done through action while experiencing load learning in a given environment. The scalar reinforcement value is taken from evaluating the selected action. The Qlearning method does not compute the desired action from the current state; instead, it learns the number of tests and errors through optimal operation on any experiential conditions. Kim et al. used the Neural Prio Ensemble method described in the article [18] to predict the network load. When new data are entered, they are accumulated in a certain amount, and are transformed into a new, deep belief network, and are saved further. At this moment, the new belief network receives previous information on the network via Combining Load bot and Neural Prior Ensemble, learning all the network load, and extracting the weight-change process. By simulating their proposed scheme, they found that as the number of sensors on the Internet increased, the number of migrations did not increase, and compared to the dynamic methods, the proposed method was closer to optimal mode. This study investigated a DL framework in order to perform dynamic watermarking on IoT. It enabled the IoT cloud framework to identify cyber-attacks and authenticate reliability. The proposed algorithm used the LSTM for extracting random properties such as spectral atness, skewness, and kurtosis, as well as the central moments of the IoT signals, with these properties being watermarked and placed alongside the original signal. The extracted features of the cloud prevented the attacker to read the watermarked data, and the eavesdropping attack would not be successful. This new LSTM reduced the complexity as well as a delay of the attack identi cation in comparison to the other security models. According to the simulation of the proposed algorithm, the attack was detected in less than 1 second and the IoT signals could be sent with high reliability from the IoT device to the cloud [32].
Ferdowsi et al. [33] in another article, in addition to the mentioned method, used the theory of games to accelerate the gateway decision-making to identify vulnerable IoT devices. They stated that in massive IoT scenarios, the veri cation of all devices at the same time is not possible due to computational (resources) constraints. The game theory-based framework can improve the identi cation of vulnerable devices. They provided two learning algorithms: The ctitious game algorithm which integrates the entire information of the entire state of the IoT devices and converges to the mixed-strategy Nash equilibrium. The other is the deep reinforcement algorithm according to the LSTM blocks that can learn safe mode from previous gateway modes. Then, If the gateway information about the state of the IoT devices is incomplete, it can predict it. The simulation results improve system protection by reducing about 30% of compromised IoT devices.
Zhu and his colleagues [34] suggested a novel Deep Q Learning-based transmission scheduling method in the cognitive IoT. This mechanism uses the Markov decision for describing various transmission states. They proposed a relay that equipped itself with a Q-learning algorithm for transmitting packets from another node to sink for improving performance. To boost the velocity of mapping between the state and activity, they used stacked autoencoders. In this article, the model was comprised of three algorithms including Strategy Iteration (SI), W learning (WL), and Random Selection (RS). They demonstrated that their model is more applicable than the Random Selection and the W Learning when they consider throughput, packet loss, and system utility. The model also outperformed WL when they considered power consumption. Although the proposed algorithm had a lower performance compared to the SI algorithm, its complexity was lower and could be applied to practical scenarios. The authors of this article argued that for the improvement of the proposed algorithm, it is possible to use multiple relays or coworker relays.
The primary studies in the eld of deep learning and IoT which focused on the network, are surveyed and analyzed. Our considerations are outlined in Tables 5 and 6.   Table 4 regarding the comparison of articles based on the publication, experimental type, and other features.  [35] study compared and evaluated three representing methods of Parallel Acceleration, Quantization, and Model Pruning to enable deep learning in the IoT. They determined the impact of the above methods on the Nvidia Tegra X2 platform with two integrated cores of ARM and GPU. Two kinds of methods are used to apply DLMs on the edge of the IoT; the rst was to equip IoT devices with CPU or ARM to enhancing their processing power and the second method is to use the middle layer to preprocess and provide data in processing on the IoT. The authors of this study used both methods. For hardware acceleration, they considered multi-core implementation and optimization instructions. On the other hand, for the second method, the lightweight DL model was proposed and the pruning of the model was evaluated. Finally, the quantization method which concerns optimization at hardware levels and DL models were applied. The deep learning model used in this study was CNN, which was evaluated in different ways.
Wei et al. [36] pointed to cloud-based IoT problems and provided a solution for the long delay and their back-haul bandwidth. They used fog-based IoT to reduce service delays and maintain back-haul bandwidth. However, the IoT performance is fogbased and is dependent on the effective and intelligent management of the network resources; thus collaboration of storage, communications, and computations is one of the major challenges to be solved. Their study simultaneously presented di culties related to the content storage strategy, the computing discharge policy, and the radio resources allocation in order to provide a common optimization solution based on deep learning for fog-based IoT. However, since the service requests and wireless signals exhibit random features, they utilize the actor-critical reinforced learning (RL) framework for solving the joint decision problem for minimizing latency. The DNN is employed in both the Actor and Critic sections. For the Critic section, it is used as the approximation function for estimating the value of functions with regard to the huge state and action space. On the other hand, the actor section is used to illustrate the parametric random policy and improve the policy to help the critic. The authors also used the gradient method to avoid convergence with maximum local value. The simulation results for o oading indicated that the tasks with low computational load were performed on the edge while those with heavy computational loads were performed in the cloud, whereby a better e ciency was achieved. Also, the average end-to-end service latency was reduced by increasing the number of nodes, while more bandwidth and sub-channels could be assigned to each user.
Tang et al. [37] developed the Convolutional Neural Network on the IoT hardware. They applied the SqueezeNet architecture and the ARM Compute Library (ACL) for implementation, to boost the deep learning processing speed and to improve the latency time in the o oading computation. The authors used the Nanomsg messaging framework to exchange messages between tasks on the IoT and used an NNVM-based compiler to optimize the model. The presented method was implemented in Tensor ow by activating the optimization of the ARM NEON vector computations. Further, the NEON-capable building blocks were utilized in the development of the SqueezeNet engine. By comparing Tensor ow and ACL, they found that despite the higher memory and power consumption in the ACL library, its runtime was better by about 150 milliseconds.
Nowadays, fog computing is popular in IoT as bandwidth and computational resources in centralized clouds are limited and the cloud is not su cient to process and analyze a lot of data. In this regard, Lyu et al.

's [38] study demonstrated a three-level Fog
Embedded Privacy-Preserving Deep Learning Framework (FPPDL) for preventing challenges such as privacy matters, response delays, computation, and communication bottlenecks. They proposed that computation is done in fog nodes close to the end equipment which developed a two-level privacy-preserving mechanism. Experimental results on 3 benchmark data sets to classify the images demonstrated that the proposed framework offers good accuracy and provides fault tolerance which is also scalable.
Diro et al.'s [39] study presented a distributed DLM capable of parallelizing the training and sharing of the parameters to local nodes in the fog. They analyzed their model on the NSL-KDD dataset for intrusion detection in computer networks. They performed their analysis on two parts; i) two classes (normal and attack); ii) four classes (Normal, DoS, Probe, R2L.U2R). The researchers used test data to detect the Zero-day attack which occurs frequently due to different protocols in the IoT. They pursued two goals for the proposed algorithm. The rst goal was to make a comparison between the ndings of the distributed attack detection and a centralized system by implementing a DLM on a single node and numerous coordinate nodes for detecting the distributed attack. Moreover, the second aim was the evaluation of the impact of the DL algorithm against shallow learning to detect attacks on the IoT-based systems. Following the hyper-parameter optimization, the DL system employed respectively 123 input characteristics, 150 neurons for the rst layer, 120 and 50 neurons for the second and third layers. Finally, in the last layer, the number of neurons was the same as the number of classes. This model utilized a batch with various sizes and 50 epochs and applied Dropout to avoid over tting.
Li et al. [40] provided an elastic model compatible with various models of deep learning on the IoT with edge computation as well as an online algorithm for improving the service capacity and off-loading strategies to function properly. Because of the different measurements of intermediate data and pre-processing overhead in various DLMs, this study raises the problem of scheduling to maximize the number of the DL tasks via restricting the network bandwidth as well as service capacity of the edge nodes. In the next step, o ine and online algorithms were introduced for solving the problems. The scheduling algorithm in this study was used for reducing the tra c load of a network while transmitting data from the sensors to a cloud server. This article referred to the edge server's limitation concerning the cloud server and applied the reduction of data size deep learning in higher layers whereby placement of the layers on the edge server could recover the network tra c. Nevertheless, as stated above, the edge server has its capacity limitations. Li and his colleagues rst trained the DL network on a cloud server, and then divided the network into two parts, including lower layers close to the input data and higher layers close to the output data. The rst part was used on the edge server while the second part was sent to the cloud server. In this study, an algorithm was proposed attempting to achieve the highest tasks in the computational edge structure by applying the deep layers on the IoT edge server, where the delay of the transmission required for each task could be guaranteed. The deep Network was used in this article and CNN's ten-task network was run with various CNN networks. Also, the number of the operations and the intermediate data created in all layers were recorded. Li et al. showed that the input data diminished by DL networks, and most of the intermediate data decreased by the lower layers, while computational overhead enhanced with further layers.
In another article, Zhang et al. [41] presented the Adaptive Deep Computational Model (ADCM). They used this model for learning the characteristics of big data in the industrial IoT. The adaptive dropout rate is designed by the adaptive distribution function to prevent over tting and to adjust the activation rate according to the position of the layer. They also used the crowdsourcing technique, which is a combination of human intelligence with machine power reducing the issue of the availability of the training samples, to accumulate labeled samples for training the model parameters. It was found that the crowdsourcing technique with cloud computations can enhance the deep computational model performance. In order to have labeled training samples for a deep computational model, some unlabeled examples were transmitted to the cloud platform, and next the labels were achieved by collecting responses provided by human workers on the cloud platform. In this regard, the Response Collecting Method is SLME [11] which is designed for multiple labeling. The authors simulated their proposed method on the CUAVE and SNAE2 datasets. They found that their model suitability for preventing over tting and providing labeled examples for training the deep computational model. Therefore, for evaluating the new method stability, all models were trained for 5 times, and then the average classi cation accuracy was employed to validate the e ciency of the proposed model. The authors stated that the results of this model could be improved by processing them on the initialization of the model.
The authors [42] presented an edge-based framework to establish a trade-off between the cost of communication and data freshness. In this framework, IoT's transitional data intelligently understood the environment through DRL and Markov Decision Process (MDP) methods. They then selected and learned storage policy based on history and current raw observational environment. The results suggested that the long-term cost of the user was reduced while the prolonged usefulness of fetching transient data items increased.  At the beginning of the work, the proposed algorithm is less e cient than the rest of the algorithms and shows its performance after a long time.
[41], 2019 Adaptive dropout Crowdsourcing method improve SLME -Preventing over tting Solving the problem of the lack of labeled data -The model output is dependent on the initialization [42], 2019 Using DRL model for solving the problem of caching IoT data at the edge without knowledge of the future popularity of IoT data.
The tradeoff between loss of data freshness and cost of communication.
Cooperative caching in IoT systems with multiple edges is not considered Again, Table 8 offers the same information as Tables 4 and 6, but for the Computing Environment domain.

3-5. Primary Studies of Application
In this section, the primary studies on the IoT and deep learning are investigated whereby it is determined that 19% of primary studies focused on the Application. The articles in this section have used the usual models of deep learning for a particular or a new application. Further, the authors' idea is to apply and comparing different types of applications with common deep learning models.
Sundaravadivel et al. [43] implemented a deep learning system for monitoring health, called Smart-Log. The ve-layer DLM was established on a perceptron neural network with compact hidden layers for regulating nutrition after meals. They introduced a new algorithm-based Bayesian network for determining the nutritional features of food, offering meals and recipes. This algorithm was presented with an accurate analysis of different Bayesian classi ers with proper performance. The built system had a smart sensor board connected to the mobile application software. This board included weight sensors for food. The weight of the food was sent via wireless to the cloud. The facts about nutrition were obtained by a smartphone camera through the smartphone program. The system then provided nutrient values. The user could access the calculations of nutrients' values and predictions using the smartphone program.
To monitor patients' nutrition, Vellappally et al. [44] installed chips in the patients' teeth, which was an electrochemical sensor for collecting information about the used foods, including fat, salt, fat, sugar, and so forth. Therefore, the collected data could be used to evaluate the consumed food quality. In the next step, information collected was processed with the use of the bacterial optimization and DL network, reviewing the information of the IoT through the self-learning process. According to the analysis, the IoT device in the teeth reduced the mastication problems. Moreover, the advantage of the prediction system of food quality based on IoT was implemented by MATLAB where the data from 53 patients were collected and 15 of them were used to test the model.
In another study, deep learning was used in medical IoT. The medical Internet of things is capable of collecting massive medical data from ultrasound images, radiography, and magnetic resonance imaging. In this article, Yao et al. learned the features by using the CNN model and the back-propagation learning algorithm from the input data, and categorized and analyzed the various types of gallbladder stones [45].
Sun and his colleagues [46] employed CNN, RNN, and Hash technology for providing a user interface for a natural image and natural language query. Their proposed architecture consisted of four modules including image training, user query, text processing, image retrieval, and data storage. In the proposed architecture, both semantic information and image cognition were important. They evaluated their proposed method for the 4S Online Store and observed that it could be an effective platform.
In the article [47], an intelligent agricultural system was dealt with through deep learning. In this article, in addition to predicting suitable products for subsequent crop cultivation, optimization of the irrigation system in the eld was also considered. A wireless network was used to collect supervisory data about soil parameters uploading data in the cloud. After analyzing by LSTM, the results were sent to the user by SMS.
Wang et al. [48] used deep learning in the usage of indoor localization, health care sense, and activity recognition. They provided a DL framework for RF sensing. Their proposed deep learning models for this framework included Autoencoder, CNN, and LSTM. The proposed framework consisted of a data collection section, preprocessing section, an o ine section for training the deep learning model, an online section for data testing, and nally the conclusion section. The results indicated that the proposed framework was more accurate in the three mentioned usages.
The primary studies in the eld of deep learning and IoT which focused on the Application, are surveyed and analyzed. Our considerations are outlined in Table 9. Table 9

Results
Upon the investigations and study of the publications, the questions mentioned in the present research are answered.
Reply to RQ1: What are the most appropriate and common tools, simulators, and datasets in the Internet of Things and Deep Learning eld?
As shown in Figure 4, having searched the published databases, 151 articles have been found in the eld of IoT and DL. Investigation of the recent trend in the number of articles has shown a signi cant boost in research and development in this eld. Hardware transformations and enhanced processing speed can be among the major reasons for growing research on DL in IoT.
Reply to RQ2: What are the most appropriate and common tools, simulators, and datasets in the Internet of Things and Deep Learning eld?
Several different tools and software for deep learning by 2019 are outlined below. The following list has been obtained by checking websites such as [49], [50], [51] with the subject of tools and software associated with deep learning.  Table 10 presents the programming environments used by the authors of the examined articles to simulate their ideas. Also included in this table are the advantages, disadvantages, platform, and interface of each programming environment. TensorFlow has been mostly used among the articles that cited their programming environment, while the Weka, R, and Caffe have been used Less frequently.  [26], [27], [48], [25], [37], [18] , [42] Keras  Table 11 describes the basic characteristics of the datasets used in the studied articles. Reply to RQ3: Which one of the deep learning models and platforms have been utilized more in the IoT?
To answer this question, we examined the ideas expressed in the articles and the deep learning models used in them. Table 12 outlines the deep learning models used in the articles and categorizes these models into two categories: Generative and Discriminative. Then, in the Learning Model column, it speci es in which of the three modes (unsupervised, supervised, and semi-supervised) this model can be grouped. It also shows the type of input data in the Typical input data column and speci es the characteristics of each model.
Intel Edison [60], [40] The chip contains the Kepler 192-core GPU coupled with a 2.3 GHz 4-core Cortex CPU and an extra low power 5 th core (LPC) outlined for the energy e ciency.

Nvidia
Jetson TX2 [28] Nvidia Tesla is Nvidia's brand title for its items focusing on stream preparing or general-purpose illustration handling units (GPGPU).
Nvidia Tesla [27] Raspberry Pi could be a series of little single-board computers created within the Joined together Kingdom by the Raspberry Pi Establishment to advance the instructing of essential computer science in schools and creating nations.
[19] MacBook Pro has a 2.8GHz Intel Core i7 processor and 16GB RAM.
Having studied the articles in the IoT and deep learning domains, we can categorized the challenges in this eld into the following groups.

Challenges associated with data and training model
Processing and analyzing large volumes of information [19].
The existence of large dimensional heterogeneous data with a variety of expressions and excessive information and multimodal features [18] [20].
The amount of information received from sensors on the IoT [31] is very high. For training a DLM, there is a need for massive labeled data. Therefore, labeling large volumes of data and make them appropriate for training DLMs are di cult [22], [3]. The lack of labeled data for training the model [41] is a confront that still demands to be addressed.
The industrial data with complex multimodal features constitutes another challenge in data analysis [16].
IoT devices send a great deal of information to servers. The tra c generated by this information exchange is another challenge that can be mentioned in this area [42].
Combining sensor data to extract latent (hidden) features [22] is also a challenge that can be addressed by deep learning.
Over tting [41] 2. Resource Challenge IoT devices are usually systems with low computing capabilities and limited memory and energy. Put differently, DL networks require a lot of resources to train their models [22]. Several ways can be presented to deal with this challenge.
Presenting effective resource solutions for the uncertainty estimation problem [22].
Presenting solutions to accurately estimate uncertainty in predicting results obtained from deep learning models [22].

Network Challenge
On the IoT, different networks are connected. The combination of heterogeneous networks and the connections between them is also one of the issues to be addressed [24] [20] [31] [29].
The scope of available network size is limited.
Absence of technical information or knowledge of the innate vulnerability by the owners of such devices [27].
The poor safety and liveness properties [26] Reply to RQ5: What are the open issues of deep learning along with IoT?
Generally, three key characteristics were identi ed for extensive deep learning's application in recent times: the scale of required data, computational power, and the network structure. Therefore, deep learning with its high capability can play an important role in IoT issues.
Processing large amounts of data in complex classi cations The algorithm model of DL enjoys a far deeper structure than the two-layered structures of the old algorithms. In some cases, the number of the layers exceeds 100, enabling deep learning for processing enormous data in complicated categorizations. DL functions to a great extent in the same way as the human learning process, and it exhibits a layer-by-layer detection process.
Deep learning helps to convert partial understanding to a deep general diagnosis, thus identifying the target subject.

Feature Extraction
Deep learning does not require manual intervention; instead, it is dependent on a computer for extracting the respective characteristics. In this way, deep learning can extract the maximum number of possible characteristics from the intended subject; non-intuitive features so that their description among these characteristics would be infeasible or hard. Finally, if there are more characteristics, identi cation and detection would be more precise.

Real-time applications
Deep learning would be utilized in real-time applications in a fog computational environment. Further, deep learning allows for more accurate decision-making for storing data on a cloud or fog space and reduces data transmission delay with proper management of data storage. It also allows for faster data storage and analysis in an appropriate environment.
The fog and blockchain technology The fog and blockchain technology, when combined with deep learning, help companies to gain more value from investing in the IoT and overcome the past constraints to cover a wider range of the IoT elds. Arti cial intelligence and machine learning with deeper analysis immediately support data streams in more accurate decision-making. The fog computations make such systems scalable by expanding the cloud's ability to the network edge. Also, through processing and analysis near to the network edge, the fog computations help address latency, bandwidth, credit, and cost issues. On the other hand, the blockchain provides security for detecting IoT transactions and will eliminate the need for a trusted central intermediary to connect the devices.

Different time-sensitive devices
We can use deep learning distributed methods in different time-sensitive devices to reduce the high computations. We can be inspired by the quick and intelligent processing power of the human mind and offer more appropriate deep learning models for IoT applications.

Conclusion
The present research presented ndings of an SMS study in combination with the SLR on the IoT with deep learning. Based on the main idea of the articles, they were divided into four sections. The subjects associated with the Network constituted the majority (i.e. 31%, 10 articles) and the IEEE had the most publications in this area. The main utilized deep learning model in the reviewed articles was the CNN model, while Tensor ow was employed to implement more deep learning models in personal computers or proper platforms for deep learning. Among the reviewed articles, data challenges and deep model learning were more prominent. This study had some constraints. Only the titles of the articles were searched while extending the search domain, other articles could be found. Finally, the study focuses on English articles, and non-English articles were excluded.  The number of all articles by years and publishers