Network intrusion detection via tri-broad learning system based on spatial-temporal granularity

Network intrusion detection system plays a crucial role in protecting the integrity and availability of sensitive assets, where the detected traffic data contain a large amount of time, space, and statistical information. However, existing research lacks the utilization of spatial-temporal multi-granularity data features and the mutual support among different data features, thus making it difficult to specifically and accurately identify anomalies. Considering the distinctions among different granularities, we propose a framework called tri-broad learning system (TBLS), which can learn and integrate the three granular features. To explore the spatial-temporal connotation of the traffic information accurately, a feature dataset containing three granularities is constructed according to the characteristics of time, space, and data content. In this way, we use broad learning basic units to extract abstract features of different granularities and then express these features in different feature spaces to enhance them separately. We use a normal distribution initialization method in BLS to optimize the weights of feature nodes and enhancement nodes for better detection accuracy. The merits of our proposed model are exhibited on the UNSW-NB15, CIC-IDS-2017, CIC-DDoS-2019, and mixed traffic datasets. Experimental results show that TBLS outperforms the typical BLS in terms of various evaluation metrics and time consumption. Compared with other machine learning methods, TBLS achieves better performance metrics.


Introduction
The influence of high-speed Internet, 5G network, wireless network [1], and the Internet of Things is growing, and network services are becoming more widely available. The user scale and economic benefits of online entertainment, online travel, and online education have increased significantly. At the same time, countless network devices and applications, and explosive network data, make the network environment increasingly complex and bring huge hidden dangers to network security. Cybercriminals are becoming more proficient in robbing the benefits of the openness of the Internet, advancing attacks at an alarming rate [2]. Therefore, network security has become a crucial concern that must be considered in the informatization construction of all sectors. Network intrusion detection system (NIDS) is a network security device that monitors the network transmission in real time, sends out an alarm or takes active response measures when a suspicious transmission is found [3]. In recent years, one of the main focuses within NIDS research has been the application of classic machine learning [4][5][6]. However, such techniques cannot fully leverage features since they automatically find mathematical solutions that can predict categories without constructing linear combinations of the original features or additional nonlinear features. Deep learning tries to learn features from data on its own, which is also very successful in the field of NIDS [7][8][9], whereas deep learning takes a long time to train and consumes a lot of memory.
Recently, an innovative randomized neural network-broad learning system (BLS) has been proposed by Chen and Liu [10], which is based on the flat network architecture [11]. The BLS does not need to use gradient descent to update weights, so the calculation speed is better. Moreover, the accuracy can be improved by increasing the width of network when the accuracy of network cannot meet the requirements, and the additional computation for increasing the width is negligible compared to the additional computation for increasing the number of layers in the deep network.
Network traffic data contain a large amount of time, space, payloads, and statistical information. Data analysis from different spatial-temporal granularities can provide different contributions for anomaly detection and analysis results. For anomalous attacks on specific protocols or services, the anomaly may only be reflected in the data related to several protocols or services, whereas other data are normal. Analyzing all the data together affects the model's judgment of anomalies. Therefore, it is difficult to capture the abnormal information shown in the local data features using the anomaly detection model with full data features. It is necessary to extract data features from different temporal and spatial dimensions.
The input of typical BLS is full data features, which is not suitable for a dynamically changing network environment as previously noted. In this paper, we propose a novel tri-broad learning system based on spatial-temporal granularity, termed TBLS. Firstly, we divide cyberspace traffic data into spatial-temporal multi-granularity feature datasets according to their characteristics as the input datasets. Secondly, parallel learning is performed on the three granularities of time, space, and data content in TBLS. The normal distribution initialization [12] is introduced into the process of generating feature nodes and enhancement nodes in TBLS for better prediction accuracy. Finally, the output of the classification is based on the features obtained from the joint mapping. Major contributions of this article are concluded as follows. • We consider the impact on the detection results of various representations of the same network behavior at different spatial-temporal granularities, which can better adapt to complex network environments. • We propose a novel TBLS model utilizing enhancement nodes to supplement spatial-temporal multi-granularity feature nodes, respectively, which can learn deep representations of different granularity data. • A normal distribution initialization is introduced in TBLS to improve the suitability and effectiveness of weight calculation.
The remainder of this article is structured as follows. Section 2 reviews the related work. Section 3 presents the typical BLS. Section 4 describes the proposed TBLS. Section 5 presents the experimental setup. Section 6 discusses the experimental results. Finally, in Sect. 7, our conclusions are drawn.

Related works
In this section, we describe network intrusion detection using machine learning approaches and BLS.

NIDS based on machine learning approaches
Machine learning-based intrusion detection research has received extensive attention from many researchers. From a more refined and accurate perspective, detection methods are divided into the following categories: classification, clustering, and combination. Classification methods mainly refer to some traditional machine learning methods, such as one-class support vector machine [13]. There are some limitations in current machine learning, and researchers can solve these pain points by adopting transfer learning or few-shot learning. Singla et al. [14] proposed the use of adversarial domain adaptation to address the problem of scarcity of labeled training data in a dataset by transferring knowledge gained from an existing dataset. Xu et al. [15] proposed a few-shot network intrusion detection method based on a metalearning framework, which can detect novel samples based on only a limited number of labels. An efficient multi-level correlation-based feature selection scheme in [16] selects the best features in subsequent levels based on parameters to reduce the dimension of the dataset. The hybrid system in [17] applies rough set theory and Bayes theorem to enlarge the detection capacity and decrease the false alarm rate.

3
Network intrusion detection via tri-broad learning system… Clustering algorithms mainly include conventional clustering and hierarchical clustering. Chen et al. [18] combined k-means with quantum-inspired ant lion optimized, which can be efficiently used for data clustering and intrusion detection. However, there are no guidelines for detecting and interpreting these anomalous events. Mulinka et al. [19] used hierarchical clustering models to detect abnormal behaviors in multi-dimensional, which can provide more fine-grained, unsupervised analysis capabilities. The cluster center initialization scheme in [20] computes semiidentical instances to avoid outliers as initial centers and reduce iterations of clustering. The solution uses unsupervised cluster validity metrics to automatically explore data structures and provide meaningful descriptions of detected patterns, enabling network operators to interpret anomalies more simply and faster.
Combination-based methods use multiple mechanisms to efficiently classify data points, among which the mechanisms of integration and fusion are applied in network traffic anomaly detection. Based on the multi-dimensional features of network traffic, Zhang et al. [21] proposed a feature fusion method based on permutation and combination, which can solve the complementary relationship among different features. Then, they performed stack ensemble learning on multiple comprehensive feature datasets, which can detect abnormal behaviors robustly. Considering the sensitivity of the detection model to different types of attacks, Li et al. [22] used the probability output and classification confidence of a single classifier as training data to build a multi-class regression model such that ensemble learning adapts to different attacks. Combination technology has higher accuracy and detection rate than single technology, so combination-based technology has greater advantages.

NIDS based on BLS
Broad learning is arousing widespread interest, and its application is being studied within many research domains, such as image classification [23,24], control field [25,26], and pattern recognition [27]. There are also several existing works within the domain of network intrusions detection.
Li et al. [28] applied recurrent neural network and BLS learning algorithms to classify known network intrusions. Experimental results indicate that BLS achieved comparable performance and shorter training time because of their wide and deep structure. After that, they evaluated performance of BLS models that employ radial basis function and incremental learning for classifying network anomalies [29]. The authors claimed that the incremental BLS algorithm requires shorter training time because the weights are only updated based on new data, whereas it requires additional memory. Other members of their team also studied the use of BLS for network anomaly detection. Laura et al. [30] implemented the cascade of the features mapping nodes, the cascade of the enhancement nodes, and the cascade of feature mapping nodes and enhancement nodes. The authors concluded that the cascade of the enhancement nodes requires significantly longer training time than other BLS variants. Subsequently, they proposed broad learning-based DDoS detection system for communication network [31]. The authors claimed that using BLS with cascade can usually achieve the best accuracy and F-score.

3
The aforementioned indicates the technology of using BLS for network intrusion detection has not been improved for the special background of the network security. The area is still in an infantile stage, with most researchers still experimenting with BLS directly or using BLS improved from other literature to produce the solutions. However, they did not specifically consider the impact of traffic characteristics on anomaly detection, especially the correlation at different granularities.

Broad learning system
Broad learning system is a neural network structure that does not rely on deep structure. The broad learning method is an incremental learning algorithm based on the random vector functional link neural network (RVFL) planar network structure [10]. The model structure is shown in Fig. 1, which includes four parts: input, feature nodes, enhancement nodes and output. Different from the traditional RVFL structure, the input weight matrix of the BLS is not randomly generated, but the optimal weights are selected during the decoding process after encoding by sparse self-coding.
After the input samples of the broad learning method are transformed, the feature expression is mapped on the feature plane to form feature nodes, and the obtained feature nodes are then subjected to activation function nonlinear transformation to generate enhancement nodes. Feature nodes and enhancement nodes are connected together as the actual input signal of the system and output linearly through the connection matrix. The basic process describing the calculation of BLS is shown below.
Firstly, we give the data {X ∈ ℝ N×M , Y ∈ ℝ N×Q } , where the dimensions of the input and the output are, respectively, indicated by M and Q, and N represents the number of input samples. Assuming that the feature nodes include n groups and each group contains k i nodes. The input data X is first mapped into a series of random Among them, the weights W ei ∈ ℝ M×k i and the bias term ei ∈ ℝ N×k i are generated randomly with the proper dimensions. We collect z i into Z n Δ = [Z 1 , Z 2 , … , Z n ] ∈ ℝ N× ∑ n i=1 k i , and Z n is further input to the enhancement nodes. Secondly, assuming that the enhancement nodes include m group, each group contains p j nodes, and the activations of the jth group can be expressed as Eq. (2).
Among them, W hj ∈ ℝ ( ∑ n i=1 k i )×p j and hj ∈ ℝ N×p j represent random matrix and bias, respectively, (⋅) is an optional nonlinear activation function. Unlike the feature node, the coefficient matrix of the enhancement node is not a random matrix, but a random matrix after orthogonal normalization. The output of the enhancement nodes is denoted as Finally, the combined matrix obtained by connecting the feature nodes and the enhancement nodes is used as the actual input of the system, and the output of BLS is described as Eq. (3).
represents the output connection weight matrix. W m n can be computed by the ridge regression as shown in Eq. (4), where C is a positive constant, and I is an identity matrix.

Proposed tri-broad learning system for NIDS
To realize the fusion of spatial-temporal multi-granularity feature data, this paper proposes a fusion algorithm framework that can effectively learn spatial-temporal multi-granularity feature data, which makes innovations and improvements on BLS. (1)

Fig. 2 Intrusion detection process
The intrusion detection process is shown in Fig. 2, where the pcap file is the raw traffic captured.

Feature dataset construction based on temporal and spatial granularity division
In this paper, we construct feature data to provide a dataset basis for building efficient anomaly detection algorithms. Due to the complex spatial-temporal granularity of cyberspace traffic data, we construct datasets with multiple subspaces based on the characteristics of specific traffic types in different spatial-temporal granularities.
In particular, we analyze traffic data from three aspects: temporal granularity, spatial granularity and data content.
Temporal granularity: It refers to the time period in which the content of each data item occurs, and different time units can be used such as hour, day, week, month, quarter, and year. It can also be the duration of the event, such as session duration and single service request time.
Spatial granularity: It refers to the scope of content covered in each data item, such as IP address, subnet, entire network, different service types, different networklayer protocols, different ports, etc. These ranges have containment and cross-relations, and the intersection of multiple different spatial dimensions can construct an exact space.
Data content: It refers to the network behavior data obtained in determining the spatial-temporal granularity, such as the number of data packets, the size of data packets, number of sessions, a total data volume of sessions, application request types, application load snapshots, etc.
Based on the above data granularity, the basic data required by the detection algorithm can be constructed. For instance, the number of sessions per day for the A network to access the SMTP service of the IP1 address, the URI information of the B network to access the Web service of the IP2 address, etc. Based on the multi-granularity characteristics of the time, space, and data content of the network space traffic data, it realizes the spatial-temporal multi-granularity division and feature fusion of data. More specifically, we divide the feature set according to the meaning of the feature item and the description of the above three granularities. For example, the IP address belongs to the spatial granularity.

Tri-broad learning system
Tri-broad learning system (TBLS) is a learning framework with BLS as the basic unit. As shown in Fig. 3, this structure is mainly composed of three BLS units to deal with the fusion of different granularity. In the figure, Z and H represent feature nodes and enhancement nodes, respectively, and the subscripts T, S, D represent temporal granularity, spatial granularity, and data content. When three granular data are input into the system, the process of model training can be divided into the following three parts: 1. Using the broad learning unit to extract the features of each granularity, mainly including feature node mapping features and enhancement node mapping features; 2. Combining the features of the three granularities in parallel as the final extracted feature; 3. By learning the output weight matrix, adopting ridge regression to generalize inverse directly obtain the global optimal solution to obtain the output category attribute.
We compute three granular features in parallel using threads. Assuming that the number of input samples of the TBLS model is N and the number of feature groups and enhancement groups of TBLS are n and m, respectively, the feature expression of temporal granularity is shown in Eq. (5).
Z n T is generated by a BLS unit and represents the characteristics of temporal granularity. Feature nodes and enhancement nodes can be expressed as Eqs. (6) and (7).
In the same way, the characteristics of spatial granularity, feature nodes, and enhancement nodes generated by another BLS unit are as shown in Eq. (8)- (10). In addition, the characteristics of the granularity of data content are expressed as Eq. (11)- (13). TBLS uses different input features and the same BLS algorithm at three granularities. BLS uses sparse coding in the process of generating feature nodes, sparse representation can effectively reduce the linear correlation degree of newly generated feature nodes, so redundant information is automatically removed in the feature extraction process and the computational complexity of training is reduced. According to the concept of multi-granularity machine learning, the latter process of fusing different granularities only complementary information to be considered. In order to better learn the common characteristics of the three granularities of time, space, and data content, they need to be mixed so that the three granularities are mapped to the same sample space. Considering the learning characteristics of neural networks, TBLS connects the features of the three granularities in parallel as the final extracted feature as Eq. (14).
The BLS in the true sense should use feature nodes and enhancement nodes as common features and act on the output network according to different weights. Therefore, the output connection matrix of the TBLS model structure contains the total weights of the three-granularity feature node layer and the enhancement node layer, which can be easily obtained by the generalized inverse of ridge regression shown in Eq. (15).
Among them, C is a positive constant, I is an identity matrix, and Y is expected output matrix composed of sample labels.

Normal distribution initialization
In BLS, the connections to the feature nodes and enhancement nodes are randomly selected, but if these initialized weights are not proper, the calculation will be invalid. Inspired by the learning style of deep learning, the normal distribution initialization [12] is developed in TBLS. The network parameters besides the output weights are adjusted by normal distribution initialization for better prediction

3
Network intrusion detection via tri-broad learning system… accuracy, whereas the output weights are still updated by a ridge regression to avoid overfitting. Proper parameter initialization should avoid exponentially scaling up or down the signal during forward propagation. The normal distribution initialization scheme makes the variance of the state value remain unchanged during forward propagation, and the variance of the gradient of the activation value remains unchanged during back propagation. We use an appropriate initialization method for the corresponding activation function. The normal distribution initialization is a method suitable for Relu, and the calculation formula is shown in Eqs. (16) and (17). Initializing according to a Gaussian distribution with a mean of 0 and a variance √ 2p j ∕k i of can ensure that the input variance scale of each layer is consistent. Among them, k i represents the number of feature nodes included in each feature mapping, and p j represents the number of enhancement nodes included in each enhancement mapping. Besides, (⋅) represents the orthogonal normalization function. The normal distribution initialization can make the variance of the feature layer and the enhancement layer roughly equal. Also, it makes the output distribution of the feature layer and the enhancement layer are very even.
Based on the description above, the procedure of TBLS is summarized in Algorithm 1. We use multiple data features of different temporal and spatial granular spaces and consider anomaly detection in different levels to increase detection accuracy. The main computational cost of Algorithm 1 lies in matrix calculation for Z i and H j . Assuming that the feature dimension of a certain granularity is o, the com- ∑ m j=1 p j ) due to three granularities of parallel computation, where o < M . In particular, the feature dimension o and parameters n, m, k, p are relatively small fixed values, so the overall complexity of the TBLS algorithm is low.

Experiment setup
In this section, we evaluate the performance of the proposed TBLS in detail, and conduct a series of related experiments on three intrusion detection datasets, namely UNSW-NB15, CIC-IDS-2017, CIC-DDoS-2019, and mixed traffic. These datasets contain a wide range of attack scenarios. All experiments are conducted using Intel Xeon CPU with 2.60 GHz and 16 GB RAM, and Python 3.7 running on Ubuntu LTS 16.04-64 bit operating system.

UNSW-NB15
The Australian Cyber Security Centre established UNSW-NB15 in 2015, which reflects modern network traffic patterns [32]. This dataset contains a large number of low-occupancy intrusions and deeply structured network traffic information, including normal data and nine types of attacks [33]. The attack types are Reconnaissance, Backdoor, DoS, Exploits, Analysis, Fuzzers, Worms, Shellcode, and Generic. As shown in Table 1, we divide the features of the UNSW-NB15 dataset into three granularities. Time refers to the time period in which the content of each data item occurs, space refers to the scope of content covered in each data item, and data content refers to the network behavior data obtained in determining the spatial-temporal granularity. In this work, training and testing datasets are used. Table 2 demonstrates the record distribution of the training and testing datasets on the UNSW-NB15 dataset.

CIC-IDS-2017
The canadian institute for cybersecurity (CIC) established CIC-IDS-2017 in 2017 [34]. This dataset collects normal traffic and common network attack traffic from Monday to Friday and gives real network data packets [35]. As shown in Table 3, we divide the features of the CIC-IDS-2017 dataset into three granularities, and the meaning of each granularity is the same as the above introduction. In detail, the dataset contains 2,830,743 records designed on 8 files, and each record contains 78 different features with its tags. Taking into account the huge amount of traffic, we choose the Wednesday-working hours set for experiments. The distribution statistic of the CIC-IDS-2017(Wed.) dataset is shown in Table 4. In this work, we choose 20% as the training dataset and 80% as the testing dataset.  Network intrusion detection via tri-broad learning system…

CIC-DDoS-2019
The CIC-DDoS2019 dataset is the latest released dataset using CICFlowMeter [36] shared by the CIC. This dataset can be accessed as pcap files or files prepared in CSV format, which contains 50,063,112 records of 13 attack types in total [37]. As shown in Table 5, we divide the features of the CIC-DDoS-2019 dataset into three granularities, in which the space granularity contains the source and destination IP, port, and protocol and the features are similar to the CIC-IDS-2017 dataset. We selected samples from each normal and attack in the dataset created between 9:43   Table 6, where the attack types are from Portmap, Syn, MSSQL, UDP, NetBIOS, LDAP, and UDPLag.

Mixed traffic
To further verify the effectiveness of TBLS in practical application scenarios, we construct a mixed traffic dataset that includes real traffic and recorded traffic. The real traffic captures the daily traffic of the laboratory LAN through Wireshark, and the recorded traffic refers to some attack data in the CIC-DDoS-2019 dataset [37]. Combining the CIC-DDoS-2019 dataset can increase different attack types. For the collected pcap files, we use CICFlowMeter to extract basic flow features. The extracted flow features are divided into three granularities, as shown in Table 7. The mixed traffic dataset contains four attacks, Portmap, MSSQL, UDP, and NetBIOS. We launch attacks such as

3
Network intrusion detection via tri-broad learning system… MSSQL, and other attacks come from the CIC-DDoS-2019 dataset. Table 8 shows the distribution of the training and testing datasets on the mixed traffic.

Data preprocessing
Data preprocessing is necessary to refine the raw data before performing anomaly detection evaluation. For the UNSW-NB15 dataset, we convert the nominal attribute value into a numeric value, such as protocol feature, service feature, and state feature. Taking the feature 'service' for example, this feature holds 12 alternatives for its value, and is represented by digits 1 to 12, respectively. For the CIC-IDS-2017 dataset, the feature 'Fwd Header Length' appears twice, 'Flow Packets/s' and 'Flow Packets/s' includes abnormal values such as 'Infinity' and 'NaN'. Therefore, we remove these feature items. For the CIC-DDoS-2019 and mixed traffic datasets, we convert IP addresses to integers. Similarly, we convert the type label to a numerical representation, e.g., 0 represents the normal type, 1 represents the Reconnaissance type, 2 represents the Backdoor type, etc.

Performance metrics
The proposed model TBLS is compared and evaluated based on seven performance indicators: accuracy, recall, precision, F-measure, false-positive rate (FPR). The corresponding calculation formula is shown in Eqs.

Results and discussion
In this section, experiments are designed and conducted to evaluate the TBLS model. To evaluate the robustness of our model fairly and quickly, we set the parameters uniformly as shown in Table 9. Besides, we do not perform normalization uniformly because of the sensitivity of features.

Comparison with typical BLS
In order to test the fusion effect of the model in this paper, we compare with typical BLS. Tables 10 and 11 give the detection results of the BLS and TBLS,   including the averages and standard deviations of 20 runs. To improve the comparability of the performance of various models, we set the number of feature groups and enhancement groups of BLS to be the same, i.e., 3n, 3m. Note that TBLS model does not rely solely on increasing the number of feature groups and enhancement groups to improve the detection accuracy, As shown in Table 10, the results of TBLS of the UNSW-NB15 dataset are better than BLS, except for precision and FPR. In general, the TBLS has the best comprehensive performance and is more stable, and the F-measure can reach 0.8415. For the CIC-IDS-2017 dataset, TBLS can increase accuracy, recall, precision, F-measure, and AUC by up to 3%, and the FPR is also greatly reduced. The TBLS results of the CIC-DDoS-2019 dataset are better than BLS, except for precision. More specifically, TBLS can increase AUC by up to 2%. For the mixed traffic dataset, all evaluation indicators of TBLS are better than BLS. In specific, the F-measure increases by nearly 4%. Therefore, the experimental results show that TBLS is able to do the actual abnormal traffic detection well, in which the UNSW-NB15 dataset has a low detection rate due to the extracted feature granularity is not fine enough whereas the other three datasets adopt new feature extraction tools like CICFlowMeter.
In Table 11, we use weighted-averaging for multi-classification. As can be seen, the TBLS outperforms BLS in all datasets. For example, TBLS increases 1.88% precision, 1.95% recall and 1.93% F-measure on the CIC-IDS-2017 dataset, and increases 6.88% precision, 0.16% recall and 2.12% F-measure on the CIC-DDoS-2019 dataset. Moreover, the standard deviation is small, indicating that the value of TBLS is closer to the average, that is, the data are more accurate. Overall, the simulation results show that the TBLS outperforms typical BLS whether it is binary classification or multi-classification. Figure 4 shows the run time of typical BLS and TBLS, and their feature groups and enhancement groups are all 3n, 3m. As we observe, the training time (TT) and detection time (DT) of TBLS are relatively shorter. The main reason is that it involves matrix multiplication when it uses feature group to calculate an enhancement node. The BLS uses 3n feature group to calculate an enhancement node, in which case the matrix multiplication is computationally complex. Our TBLS uses n feature group to calculate an enhancement node. In this case, the computational complexity of matrix multiplication is low, so the computational complexity of TBLS is small.

Effect of the normal distribution initialization
In order to evaluate the detection effect of the normal distribution initialization, we compare it with simple random initialization. Tables 12 and 13 demonstrate the comparison among the detailed performance results of the algorithms in anomaly detection. We observe that the normal distribution initialization performs well most of the time. For the UNSW-NB15 dataset, the effect of the normal distribution initialization is relatively general compared with the simple random initialization. For the CIC-IDS-2017 dataset, the Network intrusion detection via tri-broad learning system… result of the normal distribution initialization is the best among all indicators whether it is binary classification or multi-classification. Moreover, TBLS can increase accuracy, recall, precision, F-measure, and AUC by near 1%, and the FPR is also reduced. The normal distribution initialization increase 0.25% precision, 5.5% recall, 3.13% F-measure and 2.76% AUC on the CIC-DDoS-2019 dataset of binary classification. For the mixed traffic dataset, the F-measure of multi-classification increases by nearly 2%, although the F-measure of binary classification is slightly worse. Moreover, TBLS can improve the AUC and reduce the FPR. In general, changing the initialization method can improve the detection accuracy under certain data conditions.

Parameter sensitivity analysis
In order to compare the effect and influence of different parameters on the performance of the model, the experiment also carries out a parameter sensitivity analysis on the number of TBLS nodes. Figure 5 represents feature nodes and enhancement nodes that change in {100, 150, 200, 250, 300} , respectively. For the UNSW-NB15 dataset, the surface is very smooth, indicating that feature nodes and enhancement nodes have no effect on all performance metrics. For the CIC-IDS-2017 dataset, the surface fluctuates slightly, but it is not significant. For the CIC-DDoS-2019 and mixed traffic datasets, the accuracy rate is very stable, but the F-measure fluctuates a bit. In general, we conclude that TBLS fusion framework has good robustness and stability, which shows the network we set can fit the training dataset well.

Evaluation results
To further explore the novelty and efficient of the proposed TBLS model, the comparisons between the performance results of the classical machine learning algorithms, deep learning algorithms, and the current state-of-the-art schemes are demonstrated. In particular, the data in Tables 14, 15, and 16 belong to the comparison results of binary class. In Table 14, we consider naive Bayes (NB) and logistic regression (LR) algorithms for comparison. As can be seen, TBLS surpasses its competitors on all Fig. 5 The effect of hidden layer nodes on TBLS performance metrics on the four datasets. For example, TBLS increases 14.08% accuracy, 2.96% precision, 18.90% recall, 14.26% F-measure and 11.30% AUC and reduces 19.29% FPR compared with LR on the UNSW-NB15 dataset.
In Table 15, we consider deep neural network(DNN), convolutional neural network (CNN), and long short-term memory (LSTM) algorithms for comparison. Each deep learning algorithm is trained for 10 rounds, among which the number of hidden layers of DNN, CNN and LSTM are 3, 4, 3, respectively. As we observe, In Table 16, we compare with the state-of-the-art schemes such as genetic algorithm-logistic regression (GA-LR) [38] and CNN and Bi-directional LSTM (CNN-BiLSTM) [39]. For the UNSW-NB15 dataset, the recall is 16.28% higher than CNN-BiLSTM, although the precision of TBLS is 7.84% lower than it. For the CIC-DDoS-2019 dataset, TBLS increases 1.83% precision, 0.18% recall compared with CNN-BiLSTM. TBLS outperforms the GA-LR and CNN-BiLSTM with higher accuracy, recall, and precision on the CIC-IDS-2017 and mixed traffic datasets. Therefore, the experimental results validate the effectiveness of the TBLS in a real-world network security environment.

Conclusion
In this paper, a framework named TBLS is proposed for intrusion detection systems. According to the correlation of traffic information, the benchmark datasets are divided into spatial-temporal feature datasets. To improve the training efficiency and classification accuracy, the TBLS algorithm uses a broad learning method, a flat-layer network-based architecture. We improve the typical BLS by extending to TBLS and utilizing the normal distribution initialization in the process of generating feature nodes and enhancement nodes for better prediction accuracy. TBLS can effectively extract the rich information of different granular traffic and complete the task of fusion learning and classification. The proposed approach is applied to three datasets including UNSW-NB15, CIC-IDS-2017, CIC-DDoS-2019, and mixed traffic. Experimental results show that TBLS significantly outperforms the typical BLS. In particular, it enables the system to achieve better detection performance while maintaining high speed and robustness. The temporal and spatial characteristics of traffic have different performances among different network levels. Different network levels have different protocols and the lower network provides different services for the upper network. For future work, we plan to analyze the traffic at different network levels and build multiple individual broad learning system. In addition, the weights of time, space and statistics have different influences on the detection results. The weight given to each input by the self-attention mechanism depends on the interactions between the input items. In the future, we can apply the attention mechanism to fuse the three granularities to explore an improved model.