Research on Traffic Analysis System Based on Machine Learning

doi:10.21203/rs.3.rs-135689/v1

Download PDF

Research

Research on Traffic Analysis System Based on Machine Learning

https://doi.org/10.21203/rs.3.rs-135689/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

With the rapid development of computer and mobile networks, the high frequency of network usage has caused a burst growth in the traffic of routers. At the same time, due to the rise of smart home, the network traffic formed by the aggregation of sensor data is also increasing. Traffic outbreaks form mixed network types, and network viral, worms, network theft, and malicious attacks are also involved. How to distinguish traffic type, block malicious traffic and effectively use the sensor data of smart home attracts more and more attention.

Systems and Networking

Technical Communication

Artificial Intelligence and Machine Learning

Internet of things

Zigbee

Machine learning

Traffic analysis

At present, there are few studies on the combination of machine learning, traffic analysis and the Internet of Things, and the function of adding sensor data to the Internet traffic participation classification is not implemented. To solve this problem, this paper presents a home traffic analysis system combined with the Internet of Things.

First, this paper introduces the key concepts of machine learning flow analysis, introduces the origin and development trend of machine learning and flow analysis, and introduces its features and forms. The core of this paper is machine learning. With the wide application of artificial intelligence in various industries, you can use the high-speed processing ability of computers to train a large amount of data, get a model to achieve traffic classification, and finally apply it in practical situations. This paper introduces the design and implementation of the home traffic analysis system combined with the Internet of Things in detail, and demonstrates its characteristics in the process of explaining it. The first is to create the Internet of Things inside the smart home, using the sensor network architecture of Zigbee, collect data through the Zigbee node for each sensor device, and finally collect data through the coordinator to the gateway, which forwards the data to the server, thus realizing the traffic classification. In order to design and implement the system, this paper establishes a related experimental study. In the experiment, the flow data set used for training is obtained, its data characteristics are analyzed by statistical analysis, it is filtered and cleaned, and then the related algorithms of machine learning, including decision tree, random forest, regression, are used to train the flow data samples, and then the test is carried out. Samples assess model performance. In order to verify the actual effect of the model, this paper uses the package software to capture the actual traffic data and send it to the model for evaluation and judgment, to measure the effect of the model according to the accuracy of the judgment.

In order to compare the performance differences of different machine learning algorithms in the process of feature selection, this paper describes the feature selection of two different algorithms and measures them through experiments. The experimental data show that the chi-square filter method can get better accuracy and better comprehensive performance.

2.1 Machine learning

Machine learning is an important realization method in the field of artificial intelligence. Its development can be summarized in the following stages:

In the early 1950s, machine learning began to sprout. The concept of Perceptron was proposed, but this Perceptor can only handle linear classification problems, not XOR logic. In the 1960s and 1970s, Symbolism learning technology based on logical representation flourished, mainly researching inductive learning system based on logic. From the 1980s to the mid-1990s, decision, The emergence of tree makes it easy for classification algorithms to express complex data relationships in terms of knowledge learning, but it will result in the learning process facing assumptions that are too large and complex. The last stage is from the mid-1990s to today, when facing a large amount of data content, statistical learning and deep learning came into being.

At present, the research work in the field of machine learning mainly focuses on three aspects, one is task-oriented research, which focuses on performance analysis and improved learning system for a set of predefined tasks; the other is to build a cognitive model, which focuses mainly on human learning process and simulates it with high-performance computer; the last is theoretical analysis, which promotes learning through theoretical exploration. Algorithmic study of the effect. In the past decade, machine learning has achieved very effective results in the application of industry, mainly focusing on weather forecasting, image recognition, voice and handwriting recognition, stock market analysis, pattern recognition and other fields. At present, major companies have launched their own machine learning platforms, such as Tensor Flow machine learning framework of Google, Microsoft Azure machine learning studio, MLflow of open source machine learning platform, Baidu Machine Learning(BML), Ali PAI, JD neuCube, etc. Machine learning has become the main research direction of major companies.

The machine learning algorithm can effectively predict the gathered sensor data collected by the wireless sensor network, and then monitor the abnormalities of the home appliances in the sensor location, the future trend of the home environment, and so on, to realize the intelligent service of smart home energy saving reminder. The combination of smart home and machine learning is of certain research value. In the foreseeable future, there will be a large number of smart home systems that apply machine learning in the market, providing people with a more convenient and intelligent home living space.

2.2 Traffic mining analysis

Traffic mining analysis technology is a technology that captures network traffic and continuously improves the identification algorithm and extracts traffic characteristics according to the changes of network environment. Up to now, the main methods of traffic identification analysis are based on port number mapping, based on network behavior characteristics, and based on machine learning.

2.2.1 Traffic identification based on port number mapping

The specific implementation of traffic identification method based on port number mapping is to identify different network applications by checking the source and destination port numbers of network packets, and mapping the port number rules used in communication according to the corresponding network protocol or network application. Kim et al. pointed out that port classification technology is effective in identifying some applications, such as WWW, DNS, MAIL, etc., with both accuracy and recall rates higher than 90%.However, current P2P applications use port hopping technology and port masquerading technology to avoid traffic detection. Bleul et al. analyzed the direct-connect network and found that 70% of the observed ports were used only once. Schneider et al. found that when port classification technology identifies UDP traffic, the byte accuracy is only 24%.It can be seen that the port-based traffic identification technology can no longer meet the current needs, and the limitations of this method are becoming more and more obvious. First, the system does not define communication port numbers for all applications, especially for some later new applications, so it is not always possible to have a one-to-one correspondence between network port numbers and applications. Second, some common protocols do not use fixed port numbers for data transmission. In addition, services of multiple network protocols can be packaged as common applications and use the same port number, while The traffic classification and identification method based on network port mapping can no longer solve these problems. The accuracy and reliability of this method are also declining, and it can no longer meet the requirements of network traffic classification and identification.

2.2.2 Flow identification based on traffic load

The traffic classification and identification method based on payload determines the network traffic category by analyzing whether the payload of a network packet matches the feature identification library. Due to the low accuracy of port mapping in traffic identification, SenS proposed an in-depth message detection method based on application-layer protocol feature fields. This method needs to pre-establish the application layer feature recognition rule base of network traffic and verify whether it matches a feature recognition rule in the rule base by analyzing the key control information in the payload, so the process of network traffic classification and identification can also be considered as the process of pattern verification when using this method in practice. At present, with the increase of network bandwidth and the influx of a large amount of data into the Internet, the emergence of new network applications and the continuous updating of existing network applications, the flow identification feature rules that need to be stored in the rule base are also expanding rapidly, and the processing and storage costs of the system are increasing. Moreover, more importantly, a complete network payload analysis is not only expensive to compute, but also may involve user privacy disputes and data security leaks. Therefore, it has encountered some resistance in its development process.

2.2.2 Traffic identification based on machine learning

Machine learning has the ability of data mining and can extract implicit, regular and effective information from large data. Network traffic contains huge and complex data, and now academia is focusing on the method of traffic identification based on machine learning. Machine learning has many algorithms to learn from, and network traffic has many characteristics to choose from. Andrew Moore et al. gave 248 traffic characteristics to choose from. The traffic classification and recognition methods based on machine learning can be roughly divided into the traffic classification and recognition methods based on supervised learning, the traffic classification and recognition methods based on unsupervised learning and the traffic classification and recognition methods based on reinforcement learning.

2.3 Zigbee

ZigBee is a technical standard for building personal area networks using 2.4G band communication. Its implementation of MAC and Physical Layer follows IEEE 802.15.4Standard has the advantages of simple implementation, low power consumption, automatic network formation, and the ability to meet different functional requirements with a variety of topological structures. The disadvantage is that the product development is difficult, the development cycle is long, and the product cost is high. ZigBee-based wireless sensor network can be set up, which can be used in many aspects such as energy-saving research as well as data collection, processing and transmission.

Like the ZigBee protocol, Z-Stack uses a hierarchical software architecture, in which the HAL manages tasks using a time slice polling mechanism and provides a multitask management mechanism. It provides driver interfaces for a variety of peripheral modules, including timers, GPIO, universal asynchronous transceiver, analog-to-digital converters, etc. It also provides other extended service items. The OSAL is used to provide application developers with the lower-level interfaces and services required by the upper-level applications.

3.1 Zigbee networking

This paper implements ZigBee communication function based on the ZigBee protocol stack Z-Stack introduced by TI. It supports a variety of microcontrollers, including CC2530 on-chip system, CC2520 in MSP430 series, LM3S9B96 in Stellaris series. This protocol stack includes a variety of network topologies and is widely used in ZigBee industry.

The protocol stack defines how communication hardware and software communicate across layers in different hierarchies. For the data sender, the information packets sent by the user pass through each protocol layer in order from high to low, and each layer's entity adds its own unique identity information in a defined format, reaching the physical layer, transferring in the physical link as a binary stream, and reaching the data receiver. When the data receiver receives the data stream, the data packet passes through each protocol layer in order from lowest to highest, and the entities of each layer extract the data information from the data packet in a predefined format that needs to be processed at this layer, and finally reach the application layer.

Fig1 shows the network structure of ZigBee, it contains two important roles: coordinator and terminal node, which together constitute the simplest ZigBee communication process. The internal network is 2.4G wireless communication interaction. The external network is peripheral devices such as sensors and the Internet. The control of household appliances and environmental monitoring functions can be achieved through the interaction between the internal network and the external network. The coordinator role is the relay of the entire ZigBee network, which scans the current network condition, chooses the appropriate channel and network ID, and then starts ZigBee network, in addition, it will participate in assisting in configuring security parameters and application bindings within the network. In short, the coordinator role is primarily responsible for starting and configuring the network. Once it has completed its work, it can choose to switch to the router role or exit the current network. Such a change will not have any impact on the network as a whole; the terminal node itself is not responsible for the overall work of the network, it only needs to have the ability to sleep and wake up, go to sleep when not working, extend standby time, and wake up quickly once it receives the wake-up command from the coordinator.

3.2 Establishing a traffic model based on machine learning

In machine learning for traffic analysis, the effectiveness of the assessment requires data support and training. This paper uses Cambridge University's Moore dataset as a training test set for traffic classification. The dataset uses a high-performance network monitor to collect data, provides a time stamp with resolution over 35 nanoseconds, and consists of many objects, each of which is described by a set of features..With a large amount of manually classified data, each object in each dataset represents a single TCP packet flow between the client and the server. The characteristics of each object include classifications derived elsewhere and many derived features as inputs to probability Classification techniques. The information in the features is exported using the header information alone, while the classification classes are exported using content-based analysis.

The dataset contains 10 sub-datasets, totaling 377,536 data and 249 features. The 11 traffic types involved are WWW, FTP, DATABASE, P2P, SERVICE, MAIL ATTACK, etc.Each subset of the set is characterized as shown in Table 1:

Table 1 Dataset characteristics

Data subset	Duration(s)	Number of streams
entry01	1821.8	24863
entry02	1696.7	23801
entry03	1724.1	22932
entry04	1784.1	22285
entry05	1794.9	21648
entry06	1658.5	19384
entry07	1739.2	55835
entry08	1665.9	55494
entry09	1664.5	66248
entry010	1613.4	65036

Each sub dataset has a different number of streams and a different duration. The streams contained in it are TCP streams, so they have clear start and end identifiers. Each stream corresponds to different traffic types. Based on machine learning, classification models can be trained to classify the actual traffic.

3.3 Data preprocessing

The dataset itself is not always perfect. Some datasets have different data types, such as text, numbers, time series, continuity and discontinuity. It is also possible that the quality of the data is not good, there is noise, there are anomalies, there are missing, the data is wrong, the dimensions are different, there are duplicates, the data is skewed, the amount of data is too large or too small. In order for the data to fit the model and match the needs of the model, the Moore dataset needs to be pre-processed, detected from the data, corrected or deleted, inaccurate or inappropriate records for the model. Data preprocessing methods include removing unique attributes, processing missing values, attribute coding, data standardization regularization, feature selection, principal component analysis, and so on.

In machine learning, most algorithms, such as logistic regression, support vector machine SVM, k-nearest neighbor algorithm, can only process numeric data and cannot process text. In sklearn, in addition to the algorithm used to process text, other algorithms require all input arrays or matrices during training, and can not import text-based data. Some of the data in this dataset contains the characters Y and N, which cannot be processed directly using machine learning algorithms. You can encode Y as 1 and N as 0 through attribute encoding. During the network connection process, the maximum segment size cannot be known, so the dataset is represented by a'?', so there are consecutive features in the dataset that appear as'?'. For this reason, we use mean filling with Gauss white noise.

From Figure 2, it can be seen that the standard deviation and mean values of some features in the dataset are unusually large, reaching 10e17 and 10e15. For such feature data, data regularization is used. For a single sample, the sample is scaled to the unit norm for each sample. The specific process is as follows:

This paper recalculates the statistical characteristics by simply filling and replacing the data with abnormal features and normalizing the data. Figure 3 describes the dataset with some statistical features, including standard deviation, mean, 25% bits as median.

3.4 Data feature processing

When data preprocessing is complete, we need to select meaningful features to input into the machine learning algorithm and model for training.When exploratory analysis of data reveals that there are too many features introduced. To model and analyze directly with these features, further screening of the original features is required and only important features are retained. Generally, features are selected from two perspectives:

Whether a feature is divergent or not: If a feature does not diverge, for example, if the variance of a feature itself is small, then there is little difference in the sample on this feature. Maybe most of the values in the feature are the same, or even the values of the whole feature are the same, then this feature has no effect on sample differentiation.

Relevance of features to objectives: Features that are highly relevant to objectives should be selected.In addition to the variance method, the other methods described in this paper are concerned with correlation.

According to the form of feature selection, there are three feature selection methods:

Filter: A filter method that scores each feature according to divergence or correlation, sets thresholds or the number of thresholds to be selected, and selects features.

Wrapper: Packaging method that selects several features at a time based on the objective function or excludes several features, such as recursive elimination of features using a base model for multiple rounds of training. After each round of training, the features of several weight coefficients are eliminated, and then the next round of training is based on a new set of features.

Embedded: Embedded method, first uses some machine learning algorithms and models to train, get the weight coefficients of each feature, and select features from large to small according to the coefficients.Similar to the Filter method, but trained to determine the quality of the features.

To explore the performance of different algorithms in the model, this paper chooses different feature selection algorithms to obtain the best traffic classification model through comparison.

3.4.1 Variance filtering

To select the optimal hyperparameter, you can draw a learning curve to find the best point of the model.However, it takes a lot of time, and the improvement of the model is limited.In this paper, variance filtering with a threshold of 0.001 is used to first eliminate some features that are obviously not needed, and then select a better feature selection method to continue to reduce the number of features.By variance filtering, features with variances less than thresholds are removed, leaving 240 features.

2 After selecting the variance, the next step is to select meaningful features related to the target tag, which can provide a lot of information.If the feature is not tagged, it will simply waste computing memory and possibly noise the model.Here, three common methods can be used to assess the correlation between features and labels: chi-square, F-test, and mutual information.

3.4.2 Chi-square filtration

Chi-square filtering is a correlation filtering specifically for discrete tags.The chi-square test calculates the chi-square statistics between each non-negative feature and label and ranks them according to the characteristics of the chi-square statistics from high to low.Combined with the scoring criteria, the classes with the highest K-score were selected to remove features that are most likely independent of labels and unrelated to the purpose of classification.In addition, if the chi-square test detects that all values in a feature are the same, it will prompt variance filtering using the difference first. However, the selection of K value is closely related to the performance of the model. In order to obtain the best K value, we need to find ways to explore the best K value.

The F test, also known as ANOVA, variance homogeneity test, is a filtering method used to capture the linear relationship between each feature and a label.It can be used for regression or classification, where F-test classification is used for data with labels as discrete variables and F-test regression is used for data with labels as continuous variables.The output statistics can be used directly to determine what kind of K we want to set.It is important to note that the F-test is very stable when the data follows a normal distribution, so using F-test filtering first converts the data into a normal distribution.The essence of F-test is to find a linear relationship between two sets of data, assuming that there is no significant linear relationship between the data.It returns two statistics, F and P.As with chi-square filtering, we want to select features with P values less than 0.05 or 0.01 that are significantly linear with the label, while features with P values greater than 0.05 or 0.01 are considered features that have no significant linear relationship with the label and should be deleted.

Mutual information is a filtering method used to capture any relationship (both linear and non-linear) between each feature and the label.Similar to the F test, it can be used for both regression and classification, and it includes both mutual information classification and mutual information regression.Both classes have the same usage and parameters as the F test, but the mutual information method is more powerful than the F test, which can only find linear relationships, while the mutual information method can find any relationships.Mutual Information does not return statistics with similar P or F values. It returns an estimate of the amount of mutual information between each feature and the target, which takes a value between [0,1]. A value of 0 indicates that the two variables are independent and a value of 1 indicates that the two variables are fully correlated.

3.4.3 Lasso

Lasso algorithm seeks the smallest sum of squares of residuals when the sum of absolute values of model coefficients is less than a constant. It is better than stepwise regression, principal component regression, ridge regression, partial least squares and so on in variable selection. It can better overcome the shortcomings of traditional methods in model selection.Lasso regression is one of the regularization methods and is a compressed estimation.It obtains a more refined model by constructing a penalty function.Using it to compress some coefficients while setting some coefficients to zero preserves the advantage of subset shrinkage and is a biased estimate for processing data with multicollinearity.Lasso is a shrinkage estimation method based on the idea of reducing the feature set. Lasso method can compress the coefficients of features and make some regression coefficients 0, which can be used for feature selection. Lasso method can be widely used in model improvement and selection.By choosing a penalty function, Lasso's ideas and methods are used to achieve the purpose of feature selection.Model selection is essentially a process of seeking sparse representation of a model, which can be accomplished by optimizing a loss + penalty function problem.The advantage of Lasso regression method is that it can make up for the deficiencies of least squares estimation and stepwise regression local optimal estimation. It can select features well and effectively solve the problem of multicollinearity among features.Its objective function can be expressed as:

3.5 Model Training

The training process of machine learning is to first define a loss function, add input samples, and get prediction tests based on forward propagation.Compared with the real sample, the loss value is obtained, and then the reverse propagation is used to update the weight value, iterating back and forth continuously until the loss function is small and the accuracy reaches the ideal value.The parameters at this point are those required by the model.That is, the ideal model is built.This paper divides the data set into training group and testing group, the ratio is 8 to 2. First, the training data is used to preliminarily train the model, and then a preliminary model is obtained. Then, the test data is used to test the model to see if there is any phenomenon of fitting.

4.1 Chi-square filtration test

This paper combines the data characteristics of the Moore dataset, chooses chi-square filtering method, sets appropriate threshold parameters, sets K value as learning parameter through learning curve, and uses the accuracy of model training as evaluation index to get learning curve.

According to the results of Fig. 4, we can see that the accuracy of the model increases rapidly with the increase of K value in the initial stage. When K=12, the accuracy decreases, and then there is a small fluctuation with the increase of K. From the result of the curve feedback, the best K value is 25, so it is only necessary to model according to these 25 features to get a good result.Classification model.Finally, a random forest cross-validation training model with K 25 can be obtained, and the accuracy is 0.9944.

4.2 LassoCV test

This article uses the LassoCV class in liner_model of Python to obtain the weight coefficients of each eigenvalue under the optimal regularization parameters to achieve the feature selection of the dataset. It selects the first five eigenvalues of the absolute value of the coefficient in Fig. 5.

It can be seen from Fig. 5 that the feature weight of FFT_ba_Freq4，RTT_max_ba，RTT_avg_b_a，var_data_ip_ab，FFT_ab_Freq4，SYN_pkts_sent_ab，FFT_ba_Freq3，FFT_all_Freq4，FFT_ab_Freq is parge. In particular, the characteristics of FFT of packet IAT, are particularly prominent, and it is very important to identify the type of traffic. In this paper, 20 features with absolute value of feature weight ranking in the front are selected for model training, and the accuracy is only 0.755, which is much lower than the previous chi square filtering effect. The reason is that Lasso is an algorithm for multicollinearity problems, which limits the impact of multicollinearity. Lasso performs better in applications where data sets are linearly correlated. However, in this paper, the performance of this algorithm is poor. It can also be seen that Moore dataset is non-linear.

4.3 Grasp the flow for classification

As shown in the figure 6, we use the software to capture the traffic data of a certain period of time, and make statistics of the data packets in the period. In the later stage, we will extract some important features of the traffic according to the Moore data set, and input them into the model to identify the traffic types that pass in this period of time, and get the statistical results of traffic categories in the periodic time window.

The Internet of things has been gradually popularized in the daily life of the general public. More and more intelligent home appliances have entered the family. Coupled with the frequent use of the Internet in life, the data traffic has increased dramatically. This paper selects the family as the research scene, combined with the Internet of things, designs the family traffic analysis system, which can help family members understand the family's Internet traffic statistics, identify the invasion of malware or attacks, and can also judge whether there is abnormal according to the sensor traffic data uploaded by home appliances, and solve the problem of traffic island. It has good scalability and recognition It is characterized by high accuracy and convenient integration. In this paper, we use experiment to compare different machine learning algorithms in feature selection. Different algorithms perform differently in different datasets. There is no absolutely good algorithm, but in this paper, because the data set is not nonlinear, the chi square filtering algorithm has obvious advantages. In this paper, the accuracy rate is almost 100%, which provides a good model reference for the later actual traffic classification.

DBSCAN: Density Based Spatial Clustering with Noise

MAC: Media Access Control

SFS: Sequential Forward Selection

EM : Expectation Maximization

HAL : Hardware Abstraction Layer

IAT: Inter Arrival Time

OSAL : Operating System Abstraction Layer

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests

The authors declare that they have no conflict of interest.

Funding

None

Authors' contributions

All authors take part in the discussion of the work described in this paper. The authors read and approved the final manuscript.

Acknowledgements

The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.

Authors' information

¹School of Artificial Intelligence, Nanning College for Vocational Technology, Nanning 530008, Guangxi, China. ²Guangxi Vocational College of Safety Engineering, Nanning 530100, Guangxi , China.

Zhang, Y. Zhu, S. Maharjan, Y. Zhang, Edge intelligence and blockchain empowered 5G beyond for the industrial Internet of Things. IEEE Netw. 33(5), 12–19 (2019)
Zhang, G. Lindholm, H. Ratnaweera, Use long short-term memory to enhance Internet of Things for combined sewer overflow monitoring. J. Hydrol. 556, 409–418 (2018)
Yao, Y. Zhao, A. Zhang, S. Hu, H. Shao, C. Zhang, L. Su, T. Abdelzaher, Deep learning for the Internet of Things. Computer 51(5), 32–41 (2018)
Stevenson, Robert, L., Lingel, Jessica, Evolution of the Internet of Things. American Laboratory (2018).
Li, K. Ota, M. Dong, Learning IoT in edge: deep learning for the Internet of Things with edge computing. IEEE Netw. 32(1), 96–101 (2018)
Eugene, T. Thanassis, H. Wendy, Analytics for the Internet of Things. ACM Comput. Surv. 51(4), 1–36 (2018)
Yu, W. Wang, C. Runze, Zigbee-based IoT smart home system%the design of internet of things smart home system based on zigbee. Electronic Testing 000(005), 71–75 (2016)
Mingwu, T. Guilin, Non-contact monitoring system for high-temperature industrial furnace based on internet of things technology. J Hebei North Univ. (Natural Science Edition) 035(005), 42–45 (2019)
Kaisheng, T. Kaiyuan, L. Ming, L. Chao, Design of agricultural greenhouse environment monitoring system based on internet of things technology%. Design of agricultural greenhouse environment monitoring system based on Internet of Things technology. J Xi’an Univ. Sci. Technol. 035(006), 805–811 (2015)
H. Wu, Research on the application of internet of things technology to digital museum construction. Acta Geosci. Sin. 38(2), 293–298 (2017)
Wang, The innovation of computer internet of things technology in logistics field%. Innovation of computer internet of things technology in logistics field. Logistics Technol 040(003), 41–42 (2017)
Jiya, J. Feng, Laser detection and control system based on internet of things technology. Laser Magazine 040(003),153–157 (2019)
Q. Wang, Research on multi-view semi-supervised learning algorithm based on co-learning// international conference on machine learning and cybernetics. IEEE 20(6), 1276–1280 (2016)
Wang, X.J. Cheng, J.Q. Liu, Y.J. Wen, A enhanced algorithm based on RSSI and quasi Newton method for the node localization in wireless sensor networks. Comput. Knowl. Technol. 12(8), 222–225 (2016)
H. Huang, X. Xu, H.H. Zhu, M.C. Zhou, An efficient group recommendation model with multiattention-based neural networks. IEEE Transactions on Neural Networks and Learning Systems (2020)
Sun, C. Xu, G.F. Li, W.F. Xu, J.Y. Kong, D. Jiang, B. Tao, D.S. Chen, Intelligent Human Computer Interaction Based on Non Redundant EMG SignalAlexandria Engineering Journal (2020)
Wei, H. Song, W. Li, P. Shen, A. Vasilakos, Gradient-driven parking navigation using a continuous information potential field based on wireless sensor network. Information Sciences 408(2), 100-114(2017).
M. M. Gilani, T. Hong, W. Jin, G. Zhao, H. M. Heang, C. Xu, Mobility management in IEEE 802.11 WLAN using SDN/NFV technologies. EURASIP J. Wirel. Commun. Netw 67(12), 56-62 (2017).
Dongyeon, B. Hong, W. Choi, Probabilistic caching based on maximum distance separable code in a user-centric clustered cache-aided wireless network. IEEE Trans. Wirel. Commun. 18, 1792–1804 (2019)
Ruoyu, Indexing of CNN features for large scale image search. Pattern Recognit. 48(10), 2983–2992 (2018)
Deboleena, P. Panda, K. Roy, Tree-CNN: a hierarchical deep convolutional neural network for incremental learning. Neural Netw. 121, 148–160 (2018)
Hongsheng, A deep 3D residual CNN for false-positive reduction in pulmonary nodule detection. Med. Phys. 45, 2097–2107 (2018)
Ma, Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos. Pattern Recognit. 76, 506–521 (2018)
Chen, D. Chen, Y. Zhang, X. Cheng, M. Zhang, C. Wu, Deep learning for autonomous ship-oriented small ship detection. Saf. Sci. 130, 104812 (2020)
P. Rao, D. Marandin, Adaptive channel access mechanism for zigbee (ieee 802.15.4). J Comm. Software Syst. 2(4), 283–293 (2017)
Q. Huang, M. Zhang, D. Wei, D.G. Sun, J. Shi, Efficient and anti-interference method of synchronising information extraction for cideo leaking signal. Iet Signal Proc. 10(1), 63–68 (2016)
Wang, J. Yang, M. Liu, G. Gui, Lightamc: lightweight automatic modulation classification using deep learning and compressive sensing. IEEE Trans. Veh. Technol.69(3), 3491–3495 (2020).
-H. Park, M. A. Imran, P. Casari, H. Kulhandjian, H. Chen, A. Abdi, F. Dalgleish, IEEE Access special section editorial: underwater wireless communications and networking. IEEE Access.7:, 52288–52294 (2019).
Wang, J. Yang, M. Liu, G. Gui, Lightamc: lightweight automatic modulation classification using deep learning and compressive sensing. IEEE Trans. Veh. Technol.69(3), 3491–3495 (2020).
Y. Inagaki, R. Shinkuma, T. Sato, E. Oki, Prioritization of mobile IoT data transmission based on data importance extracted from machine learning model. IEEE Access. 7:, 93611–93620 (2019).

Download PDF

Editorial decision: Minor revision
23 Mar, 2021
Review #2 received at journal
18 Mar, 2021
Reviewer #2 agreed at journal
16 Mar, 2021
Review #1 received at journal
29 Jan, 2021
Reviewer #1 agreed at journal
18 Jan, 2021
Reviewers invited by journal
13 Jan, 2021
Editor assigned by journal
12 Dec, 2020
Submission checks completed at journal
12 Dec, 2020
Editor invited by journal
12 Dec, 2020
First submitted to journal
10 Dec, 2020

You are reading this latest preprint version

Research on Traffic Analysis System Based on Machine Learning

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Concepts And Work

2.1 Machine learning

2.2 Traffic mining analysis

2.2.1 Traffic identification based on port number mapping

2.2.2 Flow identification based on traffic load

2.2.2 Traffic identification based on machine learning

2.3 Zigbee

3. Methods

4. Results And Discussion

Conclusions

Abbreviations

Declarations

References

Status:

Version 1