IoT Device Identification Based on Network Traffic

doi:10.21203/rs.3.rs-3348638/v1

With the rapid development of Internet of Things (IoT) technology, the widespread deployment and heterogeneity of IoT devices bring new challenges to device identification. For network administrators, understanding the IoT devices connected to the network is very important. However, there are still some issues with existing IoT device identification methods, including the dependence on a large number of features, vulnerable features (such as IP addresses), and payloads. In response to these issues, an IoT device identification method based on network traffic is proposed in this paper. The method utilizes network traffic data of IoT devices over a short period of time, extracts protocol statistical features and flow-level statistical features, and thus achieves the identification of IoT device categories and device types. The proposed method was evaluated on two publicly available datasets, and the results demonstrate that the proposed method has superiority and can accurately and effectively identify devices.

IoT

Device identification

Network traffic analysis

Protocol statistical features

Flow-level statistical feature

The Internet of Things (IoT) refers to a network system that connects various physical devices and objects through the internet, enabling communication and interaction between devices. IoT devices can be connected to the internet through wireless networks such as Wi-Fi, Bluetooth, Zigbee, or wired networks like Ethernet, facilitating functions like remote monitoring, data collection, control, and management.

IoT devices are currently being deployed on a massive scale. According to the Global IoT Device Quantity Forecast Report released by IoT Analytics in 2023[1] , it is projected that by 2027, the global number of IoT devices is expected to reach over 29 billion, which is approximately five times the 6 billion devices recorded in 2017. This rapid growth indicates a significant expansion in the market size of IoT devices, with the total market size expected to reach 1.1 trillion US dollars by 2026[2]. IoT devices are widely used in industries such as smart homes, smart healthcare, smart agriculture, and smart manufacturing, bringing convenient, efficient, and intelligent solutions to people in a variety of ways. Especially in small office and home environments, which play an important role [3].

While the widespread use of IoT devices has brought convenience to people, it has also brought many security issues. On the one hand, most operators focus more on practicality than security when designing IoT devices, resulting in security vulnerabilities and privacy leaks in IoT devices convenience [4]. On the other hand, the limited computing and storage resources of IoT devices make it difficult to deploy traditional defense measures, making them increasingly vulnerable to Cyber Attack. Moreover, most users lack professional security knowledge, which makes it easier for attackers to exploit these vulnerabilities [5]. The serious consequences caused by these security issues have been verified in practice. For instance, in November 2022, the cybersecurity company Quarkslab discovered and reported a TPM2.0 vulnerability that could affect billions of IoT devices [6]. Additionally, The European Union Agency for Cybersecurity reported that since the Russo-Ukrainian War, there have been more and more cases of hackers exploiting vulnerabilities in IoT devices to carry out network attacks [7]. Moreover, The CSIS report stated that in 2023, there were several Cyber Attack incidents targeting IoT devices, including several European banks suffering DDoS attacks, US federal government agencies suffering global network attacks, and China’s nuclear industry suffering spy activity attacks [8].

Cyber security incidents of IoT devices are frequent, which requires network administrators to identify devices in the network environment so that necessary measures such as updating, restricting, and isolating devices with security vulnerabilities can be taken. In particular, device identification serves as the foundation and critical step in network security operations such as asset management, vulnerability response, and situational awareness [9]. It plays a crucial role in protecting network security, enhancing network reliability, safeguarding privacy, and facilitating device management, among other critical aspects. Therefore, identifying IoT devices is of paramount importance in improving network security [10].

In the context of addressing these security concerns, this paper proposes an IoT device identification method based on network traffic analysis. Specifically, the method involves extracting features from the headers of network packets generated by devices over a short period. This process yields protocol statistical features and flow-level statistical features. These two types of features reflect unique information about devices from both a global and local perspective, enabling accurate and efficient identification of IoT device categories and types. In addition, this method enhances feature extraction efficiency while safeguarding data privacy.

Network traffic is composed of multiple network packets, with each of which typically consists of three parts: Header, Payload, and Trailer. Header contains various control information and metadata, such as source and destination addresses, protocol version, packet length, and checksum. Payload is the actual data carried by the packet and can be of any type, including text, images, audio, video, and more. Trailer is used to provide additional control information or ensure the integrity of checksum data.

When it comes to methods for identifying IoT devices based on network traffic extract numerical or character features relevant to individual devices from IoT device network traffic for classification. In the current landscape, the identification method based on network traffic mainly checks the header and payload, and is usually divided into the following 4 methods: port-based identification, statistical identification, behavioral identification, and payload-based identification [11]. Port-based identification associates known port numbers with traffic types and only checks the header information of data packets, enabling rapid classification. However, due to reliance solely on port number matching, its accuracy is relatively low. Statistical identification determines traffic types based on statistical features of traffic data, such as duration, interval time, capacity, and idle time. It does not require inspecting packet contents and exhibits good adaptability. Behavioral identification analyzes device communication behavior and activity patterns in the network, such as periodic background network traffic, used protocols, and patterns. This method typically has low computational costs and high real-time capabilities, making it suitable for device identification and monitoring in large-scale network environments. However, device behavior patterns may change over time and in different environments, requiring continuous updates and adjustments to maintain accuracy. Payload-based identification judges traffic types based on packet payload, offering high reliability but is unable to handle encrypted packets. At the same time, due to the need for in-depth analysis of packets, its complexity and computational cost are high, and there is a risk of violating privacy policies and regulations. To provide an overview, Table 1 displays the comparison of these four methods for identifying IoT devices based on network traffic.

IoT Sentinel [12] is one of the earliest research efforts to use network traffic features for IoT device identification. This method can restrict communication for devices vulnerable to attacks and has relatively low performance overhead. However, due to its use of only behavior-based identification methods, its accuracy is somewhat limited. To improve the accuracy of device identification based on network traffic, existing research has adopted more comprehensive strategies, combining two or more methods from port-based, statistical identification, behavioral identification, and payload-based methods, while using a large number of features or packet information to generate fingerprints.

Kahraman et al. [13]proposed a general IoTDevID method that extracts 115 features from network packets and uses 6 scoring techniques to select 30 effective features, achieving an accuracy of 83.30% on the IoT Sentinel dataset. Likewise, Chowdhury et al. [14] extracted 218 features from a single packet header on the same dataset, achieving an accuracy of 83.35%. Additionally, HAMAD et al. [15] extracted 67 features based on the behavioral characteristics and flow-level characteristics of packets, achieving an accuracy of 90.3% on the same dataset. Similarly, A. Aksoy et al. [16] proposed the SysID method, which detects 212 features from individual packets originating from devices and uses a genetic algorithm to select 33 effective features, achieving an accuracy of over 95% on the same dataset. However, this method selectively used part of the dataset (23 out of 27 device types) to present its results.

Although the use of a multitude of features or packet information can enhance the accuracy of IoT device identification, it's important to note that the feature extraction process is time-consuming. Furthermore, selecting the optimal feature subset from a large number of features requires empirical analysis and mathematical computations, resulting in incurring computational costs. Additionally, some research employs features that are susceptible to deception (such as IP or MAC addresses) or the payload of packets (which may contain sensitive information), thereby potentially raising privacy and security concerns in data processing.

The IoTDevID [13] method involves analyzing and statistically processing payload data to obtain payload entropy. On the other hand, the methods of Chowdhury et al. [14] and A. Aksoy et al. [16] require the addition of IP addresses to improve identification rates. Similarly, the approach by AMMAR et al. [17] necessitates parsing information shared by network protocols to identify devices, including device functionalities, locations, names, and service descriptions. It's worth highlighting that device location information is sensitive and may lead to privacy concerns. Additionally, AMMAR et al. [18] also differentiate device types based on flow-level characteristics and payload text features. Meanwhile, IoTSense [19] identifies device types based on protocol features and payload-related characteristics. The method by Sivanathan et al. [20] includes the selection of TLS handshake phase cipher suites as features, which mandates deep packet inspection to determine supported cipher suites and the final selected encryption parameters. To provide a comprehensive understanding of the landscape, Table 2 presents an overview of related work, showcasing the different performances of various device fingerprint identification methods in terms of classification accuracy, difficulty of feature extraction, and data privacy and security.

There are two ways to acquire traffic data: active probing and passive monitoring, which correspond to two types of device fingerprinting techniques [21]. Active device fingerprinting techniques acquire device information through active scanning and probing. In contrast, passive device fingerprinting techniques acquire device information by monitoring the communication traffic characteristics of devices. This paper adopts a passive approach, eliminating the need for active cooperation from the devices. This approach avoids interference with the network, making it more widely applicable and feasible.

The overall framework of the proposed method is shown in. Fig. 1 First, protocol statistical features and flow-level statistical features are extracted from the short-term traffic data of devices, and then these two types of features are vectorized and fed into the random forest model for training. The trained model will be used for device identification. The protocol statistical features use the Bag of Words (BoW) [22] idea, which takes the top-level protocol type of device traffic as the vocabulary, and constructs a word vector for each device according to the vocabulary to represent the device’s protocol usage. Flow-level statistical features are derived from bidirectional flow information of devices, which takes the device traffic as input and computes statistics such as flow size, duration and transmission rate.

To fully leverage packet header information and avoid deep inspection of packet payloads, our method combines three identification approaches: statistical identification, behavioral identification. This combination not only capitalizes on the strengths of each method but also overcomes the limitations of payload-based methods. The feature combination for the proposed method is illustrated in Fig. 2, which details the sources of features for different methods and the feature combination employed in the proposed method.

3.1 Feature Extraction

This paper selects two types of features from device traffic for IoT device identification: protocol statistical features and flow-level statistical features. Protocol statistical features are global features, which are the frequency of the top-level protocol of packets in device traffic, providing a grasp of the overall behavior of the device. Flow-level statistical features are local features, which are the statistical characteristics of multiple bidirectional flows in device traffic, providing device-specific detailed information. By integrating global and local features, this paper constructs an efficient IoT device identification model. Table 3 lists the protocol statistical features and flow-level statistical features (a total of 22) used by the proposed method, all of which are derived from the header of network packets, do not include easily tampered IP or MAC addresses, and do not contain packet payloads that may carry sensitive information, thus ensuring data privacy and security.

3.1.1 Protocol Statistical Features

The proposed method combines protocol feature flags generated by devices at the link layer, network layer, transport layer, and application layer [23], along with the concept of BoW, to extract the frequency of the top-level protocols in device traffic, forming protocol-based statistical features. BoW is a text representation method that disregards text order and grammar structure, focusing solely on word frequency. As shown in Fig. 3, BoW tokenizes each sentence in the original document to build a vocabulary (Words). It then converts each sentence into a vector based on this vocabulary, which is used for subsequent machine learning tasks.

Building upon the BoW concept, this paper analyzes the top-level protocol information of short-term traffic of devices and selects four representative devices for display, as shown in Fig. 4. The devices in the figure are labeled as: (a) Aria Smart Scale, which measures user weight, body fat percentage; (b) D-Link Sensor, an intelligent motion sensor that detects and reports user motion status and environmental information; (c) EdnetCam, an intelligent camera device capable of remote monitoring and two-way communication; (d) TP-Link Plug HS100, a smart plug device that can schedule power on/off and monitor power usage of the socket.

From Fig. 4, it can be observed that there are noticeable differences in the top-level protocol packets among different devices in terms of types, frequencies, order of occurrence, and the number of continuous uses. Specifically, for Aria and EdnetCam, there are fewer protocol types, lower protocol usage frequencies, and infrequent consecutive use of specific protocols. In contrast, for D-LinkSensor and TP-LinkPlugHS100, there are more protocol types, higher protocol usage frequencies, and frequent consecutive usage of certain protocols. According to the BoW concept, the proposed method disregards the order and frequency of protocols in data packets, focusing solely on protocol frequency as protocol statistical features. Fig. 5 illustrates the protocol statistical features corresponding to the four different IoT devices corresponding to Fig. 4.

it can be seen from Fig. 5 that different devices exhibit significant differences in terms of protocol types and protocol usage frequencies. Aria has fewer protocol types, and its protocol usage frequency is low, with the highest frequency of TCP protocol usage being only 19 times. Because its primary function is weight measurement, and it does not actively generate traffic or communicate with external networks. D-LinkSensor has a greater variety of protocol types and higher protocol usage frequencies. It uses a total of 12 protocol types, with the frequencies of TCP, HTTP, and HTTPS protocols all exceeding 100 times, and the frequency of HTTPS protocol usage reaching as high as 219 times. Because it needs to regularly send heartbeat signals or maintain lightweight connections with the network to quickly respond and initiate motion detection or security functions when necessary. EdnetCam has the fewest protocol types and the lowest protocol usage frequency, using the TFTP protocol that other devices do not use. The TFTP protocol allows the EdnetCam device to download necessary firmware or configuration files from a remote server, saving device memory and storage space. In contrast to other devices, the frequency of TCP protocol used by TP-LinkPlugHS100 is significantly higher than other protocols. Because the smart plug establishes a stable and reliable connection to perform open or close operations. To meet communication requirements such as transmission efficiency, real-time performance, and reliability, IoT devices choose IoT protocols that align with their needs. Therefore, there exist differences in protocol types and protocol usage frequencies among different IoT devices.

3.1.2 Flow-Level Statistical Features

Protocol statistical features only reflect the global information of devices, neglecting the contextual relationships between protocols. Thus, relying solely on these features cannot fully distinguish devices. To enhance device identification accuracy, this paper also employs flow-level statistical features that can reveal detailed information about devices during data transmission. Flow-level statistical features include (Volume), duration (Duration), average flow rate (Rate), average Time To Live (TTLM), average TCP data offset (TCPdataofs)，and average TCP window size (TCPWM) of bidirectional flows of device traffic. Among these, TTL (Time To Live) represents the maximum number of hops a packet is allowed to traverse, serving as a constraint on the packet's lifespan within the network. Different devices may have varying TTL values due to factors such as the operating system, router settings, and network topology. TCP window size, which denotes the number of packets the receiver can send and acknowledge within a single acknowledgment, reflects the device's memory and processing speed. These two factors vary based on the functional requirements and design of IoT devices. TCP data offset changes with the length of the TCP data packet header, and the length of a device's packet header varies based on its protocol type. For instance, some devices may employ encryption or compression techniques to secure or optimize data transmission, which can impact the length of the packet header.

To explore the differences in flow-level statistical features among different devices, this paper selects TTLM, TCPdataofs, and TCPWM for device analysis. In order to provide a comprehensive analysis of these features, Fig. 6 presents the probability distribution of flow-level statistical features of short-term traffic for three types of IoT devices. The three devices are: D-LinkCam, a smart camera with remote control, management, and two-way voice communication capabilities; D-LinkDoorSensor, a device that can remotely monitor the status of doors and windows; and HomeMaticPlug, a smart socket that can remotely control and intelligently manage devices.

As shown in Fig. 6(a), different IoT devices exhibit distinct preferences for TTLM. For D-LinkCam, TTLM mostly falls between [120, 140] hops, while D-LinkDoorSensor and HomeMaticPlug are around 60 hops and 90 hops, respectively. From Fig. 6(b), it is evident that these IoT devices also differ in their TCPdataofs. D-LinkCam predominantly concentrates at 8 bytes, whereas D-LinkDoorSensor and HomeMaticPlug are primarily at 6 bytes and 5 bytes, respectively. From Fig. 6(c), it can be seen that the average TCP window size of different IoT devices shows obvious dispersion, with each device having its own characteristic range. The TCPWM of D-LinkCam is mainly concentrated between [3500, 4000] and [6500, 7500] bytes, while the distribution interval of D-LinkDoorSensor is between [4000, 6000] bytes, and HomeMaticPlug is around 9000 bytes. TCPWM for D-LinkCam mainly concentrates in [3500, 4000] and [6500, 7500], These differences emphasize that flow-level statistical features are crucial determinants for device identification.

The hardware setup for the experiments in this paper consisted of a computer running Windows 10 with 16GB of RAM and an Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz 2.70 GHz processor. The software environment used was PyCharm 2022. Two real-world datasets containing data from IoT smart home or small office devices were utilized to test the proposed method, aiming to validate its effectiveness.

4.1 Datasets

(1) Dataset 1: IoT Sentinel

The IoT Sentinel dataset [12] contains network traffic data generated during the initialization process of 27 types of IoT devices, with the initialization process of each type of device repeated at least 20 times to ensure data sufficiency and reliability. The dataset covers common devices in smart homes, such as cameras, health monitors, smart plugs, and smart sensors. Detailed information about the IoT Sentinel dataset is shown in Table 4, including the manufacturer name (Manufacturer), the number of samples for each type of device (Samples), the total number of packets for each device (Number of packets), and the connection methods used (Connection Methods). Among them, ‘Wired & Wireless’ represents that the device supports both wired and wireless connection methods; ‘*’ indicates that the device comes from the same manufacturer mentioned above.

(2) Dataset 2: UNSW

The UNSW dataset [20] contains a total of 28 smart devices, including 21 IoT devices and 7 non-IoT devices, and records the network traffic of these devices for 20 consecutive days. The types of IoT devices include network cameras, smart switches, air quality sensors, printers, smart speakers, smart light bulbs, smart photo frames, and healthcare devices. For the purpose of this study, only the network traffic of IoT devices from this dataset is retained. The detailed information about IoT devices in the UNSW dataset is presented in Table 5, which follows the same structure as Table 4

Table 4 IoT Sentinel dataset details

Manufacturer	Device Type	Samples	Number of packets	Connection Methods
Aria	Aria	20	942	Wireless
HomeMaticPlug	HomeMaticPlug	20	1061	Wireless
Withings	Withings	20	1362	Wireless
MAXGateway	MAXGateway	20	1155	Wired
Philips Hue	HueBridge	20	26944	Wired / Wireless
*	HueSwitch	20	38975	Wireless
Ednet	EdnetGateway	20	1405	Wireless
*	EdnetCam	20	433	Wired / Wireless
Edimax	EdimaxCam	20	876	Wired / Wireless
*	EdimaxPlug1101W	20	1756	Wireless
*	EdimaxPlug2101W	20	1637	Wireless
Lightify	Lightify	20	7401	Wireless
Belkin	WeMoInsightSwitch	25	9747	Wireless
*	WeMoLink	20	10978	Wireless
*	WeMoSwitch	25	7453	Wireless
D-Link	D-LinkHomeHub	20	15858	Wired / Wireless
*	D-LinkDoorSensor	25	3776	Wireless
*	D-LinkDayCam	20	1215	Wired / Wireless
*	D-LinkCam	20	7454	Wireless
*	D-LinkSwitch	20	12930	Wireless
*	D-LinkWaterSensor	20	12078	Wireless
*	D-LinkSiren	20	11793	Wireless
*	D-LinkSensor	20	12671	Wireless
TP-Link	TP-LinkPlugHS110	20	1209	Wireless
*	TP-LinkPlugHS100	20	1332	Wireless
Smarter	SmarterCoffee	20	222	Wireless
*	iKettle2	20	208	Wireless

Table 5 UNSW dataset details (IoT devices)

Manufacturer	Device Type	Samples	Number of packets	Connection Methods
SmartThings	Smart Things	20	1972	Wired
AmazoneEcho	Amazon Echo	20	3819	Wireless
Netatmo	Netatmo Welcome	20	1552	Wireless
*	Netatmo weather station	20	1701	Wireless
TP-Link	TP-Link Smart plug	15	246	Wireless
*	TP-Link Day Night Cloud camera	13	621	Wireless
SamsungCam	Samsung SmartCam	20	3440	Wireless
Google Nest	Dropcam	20	11619	Wireless
*	NEST Protect smoke alarm	19	3960	Wireless
InsteonCamera	Insteon Camera	15	2999	Wired / Wireless
Withings	Withings Smart Baby Monitor	15	2491	Wired
*	Withings Aura smart sleep sensor	15	1651	Wireless
*	Withings Smart scale	20	3251	Wireless
Belkin	Belkin Wemo switch	20	2577	Wireless
*	Belkin wemo motion sensor	20	3127	Wireless
iHome	iHome	9	689	Wireless
Blipcare	Blipcare Blood Pressure meter	3	172	Wireless
LifX	LiFX Smart Bulb	15	883	Wireless
TribySpeaker	Triby Speaker	20	702	Wireless
PIX-STAR	PIX-STAR Photo-frame	13	311	Wireless
HP Printer	HP Printer	20	702	Wireless

4.2 Performance Metrics

To assess the performance of the proposed method, this paper selects four performance metrics: recall, precision, accuracy, F1 score and confusion matrix.

where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively. Recall, also known as the true positive rate, is the proportion of correctly classified positive samples to the actual number of positive samples. The higher the value, the more positive samples the model can find. Precision is the proportion of correctly classified positive samples to all predicted positive samples. The higher the value, the more accurate the model’s judgment of positive samples. Accuracy is the proportion of correctly classified samples to the total number of samples. The higher the value, the more accurate the model’s overall judgment. The F1 score is a weighted average of precision and recall. It takes into account both the precision and recall of the model. The higher the F1 score, the more robust the model.

4.3 Experimental Results and Analysis

To evaluate the effectiveness of the proposed method, this paper conducts experiments on two datasets (IoT Sentinel and UNSW) and compares its results with other methods. The experiments include identifying device categories (IoT device manufacturers) and device types (specific device models). The proposed method extracts device fingerprints from device network traffic data, obtaining 550 fingerprint data representing 27 types of devices from IoT Sentinel and 352 fingerprint data representing 21 types of devices from UNSW.

Many researchers in the field have explored the use of multiple machine learning models to effectively utilize heterogeneous features extracted from various perspectives and selected the best-performing model for IoT device identification. Currently, widely used models include Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and k-Nearest Neighbors (kNN). Among these models, the Random Forest model has demonstrated superior performance in many studies. Therefore, this paper directly chooses Random Forest as the classification model.

As revealed in Chapter Three, this study focuses on short-term device traffic. IoT Sentinel records device access traffic, which exhibits short-term characteristics. In contrast, UNSW covers the traffic within the whole day of the device, it is necessary to extract the short-term traffic from it. To obtain short-term device traffic samples that suit the research needs, this study extracted traffic data from the UNSW dataset within a few minutes after device communicates and determines the best segmentation time according to different segmentation thresholds. The evaluation indicators and time consumption of feature extraction corresponding to different segmentation thresholds are shown in Fig. 7.

Fig. 7(a) shows the change curves of various evaluation indicators (Recall, Precision, Accuracy, and F1 score) with the segmentation threshold. As the threshold decreases below 4 minutes, all evaluation metrics show an upward trend. However, when the threshold surpasses 4 minutes, these metrics exhibit slight increases but remain relatively stable, maintaining a horizontal trend. On the other hand, Fig. 7(b) presents the feature extraction time corresponding to different traffic segmentation thresholds. It's worth noting that with the increase in the threshold, the time required for feature extraction also increases. Despite a slight increase in the time required for feature extraction within 4 minutes compared to 3 minutes, the corresponding increase in evaluation metrics is higher. Therefore, the segmentation threshold is set at 4 minutes.

4.3.1 Device Category Identification

This paper classifies devices according to the manufacturer name and device name (a single device from the manufacturer), and the IoT Sentinel dataset and UNSW dataset are divided into 12 and 15 categories, respectively. A breakdown of the specific categories can be found in Table 4 and Table 5. The features extracted from each category are input into the classification model of the proposed method, and the resulting confusion matrices for each category are depicted in Fig. 8.

From Fig. 8(a), it can be seen that in the IoT Sentinel dataset, with the exception of 0.3% of the Edimax category being misclassified as Ednet, the remaining device categories were correctly identified (100% accuracy). From Fig. 8(b), it becomes evident that out of the 15 device categories in the UNSW dataset, 12 were correctly identified, and even in the remaining categories, one category achieved an impressive accuracy rate of 99%. These results clearly demonstrate that the proposed method has achieved a high classification rate in identifying manufacturers (device categories) but also exhibits strong generalization ability, making it a reliable approach for IoT device classification across a variety of categories.

4.3.2 Device Type Identification

To verify the classification performance of the proposed method for device types, corresponding experiments were conducted on the Sentinel dataset and UNSW dataset. Among them, the IoT Sentinel dataset and UNSW dataset were divided into 27 and 21 device types, respectively. The detailed experimental results can be found in Table 6.

Table 6 Performance metrics of the proposed method

From Table 6, it can be observed that the proposed method performs well on both datasets. On the IoT Sentinel dataset, recall, precision, accuracy, and F1 score all exceed 91%, indicating a high level of classification accuracy. Similarly, on the UNSW dataset, these metrics all exceed 98%. It's worth noting that the performance of the method on the UNSW dataset is significantly higher than that on the IoT Sentinel dataset. This disparity can be attributed to the composition of the datasets. The IoT Sentinel dataset contains similar devices from the same manufacturer with similar purposes, such as TP-LinkPlugHS110 and TP-LinkPlugHS100, making it challenging to completely distinguish these devices based solely on the network traffic during their initialization processes.

4.3.2 Comparison with existing methods

To evaluate the feasibility and effectiveness of the proposed method, a comparison was made with previously published literature [12], [13], and [19]. The comparison results can be seen in Table 7 and Table 8

Table 7 displays the various indicators of device category identification by different methods. Upon examination of Table 7, it is evident that the proposed method performs well on both datasets, achieving outstanding results with all indicators reaching 99%. Notably, the other three methods achieved relatively high identification results on both datasets, with accuracy rates exceeding 92%. Table 8 shows the various indicators of device type identification by different methods. Upon reviewing Table 8, it can be seen that the proposed method performs well on both datasets, especially on the UNSW dataset, where all indicators exceed 98%. Similarly, the other three methods also performed well on the UNSW dataset, with accuracy rates exceeding 91%. In summary, when compared with other methods, the proposed method consistently outperforms on different datasets, achieving superior identification results. These findings underscore the feasibility and effectiveness of the proposed approach for IoT device identification and classification.

This paper proposed an IoT device identification method based on network traffic. The method extracted protocol statistical features and flow-level statistical features from the network traffic data of the device in a short period of time to achieve the identification of device categories and types. The method extracted effective features that were easy to capture from the packet header, avoiding in-depth inspection of the packet payload, improving feature extraction efficiency, and reducing computational complexity. In addition, the selected features did not involve easily tampered IP addresses and payloads, ensuring data privacy and security. The method effectively identified IoT devices on two different datasets, demonstrating its wide applicability.

Acknowledgments

This work was supported by the Project of the Key Laboratory of Wireless Sensor Networks in University of Sichuan Province (WSN2022001).

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships which have influenced the work reported in this manuscript.

Sinha S (2023) State of IoT 2023: Number of connected IoT devices growing 16% to 16.7 billion globally. https://iot-analytics.com/number-connected-iot-devices/.
IDC (2023) Worldwide Spending on the Internet of Things is Forecast to Surpass $1 Trillion in 2026, According to a New IDC Spending Guide.
Marchal S, Miettinen M, Nguyen TD, Sadeghi A-R, Asokan N (2019) AuDI: Toward Autonomous IoT Device-Type Identification Using Periodic Communication. IEEE Journal on Selected Areas in Communications 37:1402-1412.https://doi.org/10.1109/jsac.2019.2904364
Kolias C, Kambourakis G, Stavrou A, Voas J (2017) DDoS in the IoT: Mirai and Other Botnets. Computer 50:80-84.https://doi.org/10.1109/MC.2017.201
Alrawi O, Lever C, Antonakakis M, Monrose F (2019) SoK: Security Evaluation of Home-Based IoT Deployments. 2019 IEEE Symposium on Security and Privacy (SP):1362-1380.https://doi.org/10.1109/SP.2019.00013
Lakshmanan R (2023) New Flaws in TPM 2.0 Library Pose Threat to Billions of IoT and Enterprise Devices. https://thehackernews.com/2023/03/new-flaws-in-tpm-20-library-pose-threat.html.
ENISA (2022) ENISA Threat Landscape 2022. https://www.enisa.europa.eu/publications/enisa-threat-landscape-2022.
CSIS (2022) Significant Cyber Incidents. https://www.csis.org/programs/strategic-technologies-program/significant-cyber-incidents.
Du R, Wang J, Li S, Liu W (2022) A Lightweight Flow Feature-Based IoT Device Identification Scheme. Sec and Commun Netw 2022:10.https://doi.org/10.1155/2022/8486080
Chowdhury RR, Idris AC, Abas PE (2023) A Deep Learning Approach for Classifying Network Connected IoT Devices Using Communication Traffic Characteristics. Journal of Network and Systems Management 31:26.https://doi.org/10.1007/s10922-022-09716-x
Tahaei H, Afifi F, Asemi A, Zaki F, Anuar NB (2020) The rise of traffic classification in IoT networks: A survey. J Netw Comput Appl 154:102538
Miettinen M, Marchal S, Hafeez I, Asokan N, Sadeghi A-R, Tarkoma S (2017) IoT SENTINEL: Automated Device-Type Identification for Security Enforcement in IoT. Paper presented at the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS),
Kostas K, Just M, Lones MA (2022) IoTDevID: A Behavior-Based Device Identification Method for the IoT. IEEE Internet of Things Journal 9:23741-23749
Chowdhury RR, Aneja S, Aneja N, Abas E (2020) Network Traffic Analysis based IoT Device Identification. Paper presented at the Proceedings of the 2020 4th International Conference on Big Data and Internet of Things,
Hamad SA, Zhang WE, Sheng QZ, Nepal S (2019) IoT Device Identification via Network-Flow Based Fingerprinting and Learning. Paper presented at the 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE),
Aksoy A, Gunes MH Automated IoT Device Identification using Network Traffic. In: ICC 2019 - 2019 IEEE International Conference on Communications (ICC), 20-24 May 2019 2019. pp 1-7.https://doi.org/10.1109/ICC.2019.8761559
Ammar N, Noirie L, Tixeuil S Network-Protocol-Based IoT Device Identification. In: 2019 Fourth International Conference on Fog and Mobile Edge Computing (FMEC), 10-13 June 2019 2019. pp 204-209.https://doi.org/10.1109/FMEC.2019.8795318
Ammar N, Noirie L, Tixeuil S (2020) Autonomous Identification of IoT Device Types based on a Supervised Classification. ICC 2020 - 2020 IEEE International Conference on Communications (ICC):1-6
Bezawada B, Bachani M, Peterson J, Shirazi H, Ray I, Ray I (2018) IoTSense: Behavioral Fingerprinting of IoT Devices. ArXiv abs/1804.03852
Sivanathan A, Gharakheili HH, Loi F, Radford A, Wijenayake C, Vishwanath A, Sivaraman V (2019) Classifying IoT Devices in Smart Environments Using Network Traffic Characteristics. IEEE Transactions on Mobile Computing 18:1745-1759.https://doi.org/10.1109/TMC.2018.2866249
LN F, CL L, YC W, YC W, ZL W, H L, JH Y Survey on IoT Device Identification and Anomaly Detection. Ruan Jian Xue Bao/Journal of Software (in Chinese):1-21.https://doi.org/10.13328/j.cnki.jos.006818
Nguyen-Duc H, Do-Hong T, Le-Tien T, Bui-Thu C (2013) A survey of classification accuracy using multifeatures and multi-kernels. 2013 International Conference on Advanced Technologies for Communications (ATC 2013):661-666
Fan L, Zhang S, Wu Y, Wang Z, Duan C, Li J, Yang J An IoT Device Identification Method based on Semi-supervised Learning. In: 2020 16th International Conference on Network and Service Management (CNSM), 2-6 Nov. 2020 2020. pp 1-7.https://doi.org/10.23919/CNSM50824.2020.9269044

No competing interests reported.

IoT Device Identification Based on Network Traffic

Status:

Version 1

Abstract

Figures

Introduction

Related Works

Proposed Methodology

Experiments and Evaluation

Conclusion

Declarations

References

Additional Declarations

Status:

Version 1