The hardware setup for the experiments in this paper consisted of a computer running Windows 10 with 16GB of RAM and an Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz 2.70 GHz processor. The software environment used was PyCharm 2022. Two real-world datasets containing data from IoT smart home or small office devices were utilized to test the proposed method, aiming to validate its effectiveness.
4.1 Datasets
(1) Dataset 1: IoT Sentinel
The IoT Sentinel dataset [12] contains network traffic data generated during the initialization process of 27 types of IoT devices, with the initialization process of each type of device repeated at least 20 times to ensure data sufficiency and reliability. The dataset covers common devices in smart homes, such as cameras, health monitors, smart plugs, and smart sensors. Detailed information about the IoT Sentinel dataset is shown in Table 4, including the manufacturer name (Manufacturer), the number of samples for each type of device (Samples), the total number of packets for each device (Number of packets), and the connection methods used (Connection Methods). Among them, ‘Wired & Wireless’ represents that the device supports both wired and wireless connection methods; ‘*’ indicates that the device comes from the same manufacturer mentioned above.
(2) Dataset 2: UNSW
The UNSW dataset [20] contains a total of 28 smart devices, including 21 IoT devices and 7 non-IoT devices, and records the network traffic of these devices for 20 consecutive days. The types of IoT devices include network cameras, smart switches, air quality sensors, printers, smart speakers, smart light bulbs, smart photo frames, and healthcare devices. For the purpose of this study, only the network traffic of IoT devices from this dataset is retained. The detailed information about IoT devices in the UNSW dataset is presented in Table 5, which follows the same structure as Table 4
Table 4 IoT Sentinel dataset details
Manufacturer
|
Device Type
|
Samples
|
Number of packets
|
Connection Methods
|
Aria
|
Aria
|
20
|
942
|
Wireless
|
HomeMaticPlug
|
HomeMaticPlug
|
20
|
1061
|
Wireless
|
Withings
|
Withings
|
20
|
1362
|
Wireless
|
MAXGateway
|
MAXGateway
|
20
|
1155
|
Wired
|
Philips Hue
|
HueBridge
|
20
|
26944
|
Wired / Wireless
|
*
|
HueSwitch
|
20
|
38975
|
Wireless
|
Ednet
|
EdnetGateway
|
20
|
1405
|
Wireless
|
*
|
EdnetCam
|
20
|
433
|
Wired / Wireless
|
Edimax
|
EdimaxCam
|
20
|
876
|
Wired / Wireless
|
*
|
EdimaxPlug1101W
|
20
|
1756
|
Wireless
|
*
|
EdimaxPlug2101W
|
20
|
1637
|
Wireless
|
Lightify
|
Lightify
|
20
|
7401
|
Wireless
|
Belkin
|
WeMoInsightSwitch
|
25
|
9747
|
Wireless
|
*
|
WeMoLink
|
20
|
10978
|
Wireless
|
*
|
WeMoSwitch
|
25
|
7453
|
Wireless
|
D-Link
|
D-LinkHomeHub
|
20
|
15858
|
Wired / Wireless
|
*
|
D-LinkDoorSensor
|
25
|
3776
|
Wireless
|
*
|
D-LinkDayCam
|
20
|
1215
|
Wired / Wireless
|
*
|
D-LinkCam
|
20
|
7454
|
Wireless
|
*
|
D-LinkSwitch
|
20
|
12930
|
Wireless
|
*
|
D-LinkWaterSensor
|
20
|
12078
|
Wireless
|
*
|
D-LinkSiren
|
20
|
11793
|
Wireless
|
*
|
D-LinkSensor
|
20
|
12671
|
Wireless
|
TP-Link
|
TP-LinkPlugHS110
|
20
|
1209
|
Wireless
|
*
|
TP-LinkPlugHS100
|
20
|
1332
|
Wireless
|
Smarter
|
SmarterCoffee
|
20
|
222
|
Wireless
|
*
|
iKettle2
|
20
|
208
|
Wireless
|
Table 5 UNSW dataset details (IoT devices)
Manufacturer
|
Device Type
|
Samples
|
Number of packets
|
Connection Methods
|
SmartThings
|
Smart Things
|
20
|
1972
|
Wired
|
AmazoneEcho
|
Amazon Echo
|
20
|
3819
|
Wireless
|
Netatmo
|
Netatmo Welcome
|
20
|
1552
|
Wireless
|
*
|
Netatmo weather station
|
20
|
1701
|
Wireless
|
TP-Link
|
TP-Link Smart plug
|
15
|
246
|
Wireless
|
*
|
TP-Link Day Night Cloud camera
|
13
|
621
|
Wireless
|
SamsungCam
|
Samsung SmartCam
|
20
|
3440
|
Wireless
|
Google Nest
|
Dropcam
|
20
|
11619
|
Wireless
|
*
|
NEST Protect smoke alarm
|
19
|
3960
|
Wireless
|
InsteonCamera
|
Insteon Camera
|
15
|
2999
|
Wired / Wireless
|
Withings
|
Withings Smart Baby Monitor
|
15
|
2491
|
Wired
|
*
|
Withings Aura smart sleep sensor
|
15
|
1651
|
Wireless
|
*
|
Withings Smart scale
|
20
|
3251
|
Wireless
|
Belkin
|
Belkin Wemo switch
|
20
|
2577
|
Wireless
|
*
|
Belkin wemo motion sensor
|
20
|
3127
|
Wireless
|
iHome
|
iHome
|
9
|
689
|
Wireless
|
Blipcare
|
Blipcare Blood Pressure meter
|
3
|
172
|
Wireless
|
LifX
|
LiFX Smart Bulb
|
15
|
883
|
Wireless
|
TribySpeaker
|
Triby Speaker
|
20
|
702
|
Wireless
|
PIX-STAR
|
PIX-STAR Photo-frame
|
13
|
311
|
Wireless
|
HP Printer
|
HP Printer
|
20
|
702
|
Wireless
|
4.2 Performance Metrics
To assess the performance of the proposed method, this paper selects four performance metrics: recall, precision, accuracy, F1 score and confusion matrix.
where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively. Recall, also known as the true positive rate, is the proportion of correctly classified positive samples to the actual number of positive samples. The higher the value, the more positive samples the model can find. Precision is the proportion of correctly classified positive samples to all predicted positive samples. The higher the value, the more accurate the model’s judgment of positive samples. Accuracy is the proportion of correctly classified samples to the total number of samples. The higher the value, the more accurate the model’s overall judgment. The F1 score is a weighted average of precision and recall. It takes into account both the precision and recall of the model. The higher the F1 score, the more robust the model.
4.3 Experimental Results and Analysis
To evaluate the effectiveness of the proposed method, this paper conducts experiments on two datasets (IoT Sentinel and UNSW) and compares its results with other methods. The experiments include identifying device categories (IoT device manufacturers) and device types (specific device models). The proposed method extracts device fingerprints from device network traffic data, obtaining 550 fingerprint data representing 27 types of devices from IoT Sentinel and 352 fingerprint data representing 21 types of devices from UNSW.
Many researchers in the field have explored the use of multiple machine learning models to effectively utilize heterogeneous features extracted from various perspectives and selected the best-performing model for IoT device identification. Currently, widely used models include Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and k-Nearest Neighbors (kNN). Among these models, the Random Forest model has demonstrated superior performance in many studies. Therefore, this paper directly chooses Random Forest as the classification model.
As revealed in Chapter Three, this study focuses on short-term device traffic. IoT Sentinel records device access traffic, which exhibits short-term characteristics. In contrast, UNSW covers the traffic within the whole day of the device, it is necessary to extract the short-term traffic from it. To obtain short-term device traffic samples that suit the research needs, this study extracted traffic data from the UNSW dataset within a few minutes after device communicates and determines the best segmentation time according to different segmentation thresholds. The evaluation indicators and time consumption of feature extraction corresponding to different segmentation thresholds are shown in Fig. 7.
Fig. 7(a) shows the change curves of various evaluation indicators (Recall, Precision, Accuracy, and F1 score) with the segmentation threshold. As the threshold decreases below 4 minutes, all evaluation metrics show an upward trend. However, when the threshold surpasses 4 minutes, these metrics exhibit slight increases but remain relatively stable, maintaining a horizontal trend. On the other hand, Fig. 7(b) presents the feature extraction time corresponding to different traffic segmentation thresholds. It's worth noting that with the increase in the threshold, the time required for feature extraction also increases. Despite a slight increase in the time required for feature extraction within 4 minutes compared to 3 minutes, the corresponding increase in evaluation metrics is higher. Therefore, the segmentation threshold is set at 4 minutes.
4.3.1 Device Category Identification
This paper classifies devices according to the manufacturer name and device name (a single device from the manufacturer), and the IoT Sentinel dataset and UNSW dataset are divided into 12 and 15 categories, respectively. A breakdown of the specific categories can be found in Table 4 and Table 5. The features extracted from each category are input into the classification model of the proposed method, and the resulting confusion matrices for each category are depicted in Fig. 8.
From Fig. 8(a), it can be seen that in the IoT Sentinel dataset, with the exception of 0.3% of the Edimax category being misclassified as Ednet, the remaining device categories were correctly identified (100% accuracy). From Fig. 8(b), it becomes evident that out of the 15 device categories in the UNSW dataset, 12 were correctly identified, and even in the remaining categories, one category achieved an impressive accuracy rate of 99%. These results clearly demonstrate that the proposed method has achieved a high classification rate in identifying manufacturers (device categories) but also exhibits strong generalization ability, making it a reliable approach for IoT device classification across a variety of categories.
4.3.2 Device Type Identification
To verify the classification performance of the proposed method for device types, corresponding experiments were conducted on the Sentinel dataset and UNSW dataset. Among them, the IoT Sentinel dataset and UNSW dataset were divided into 27 and 21 device types, respectively. The detailed experimental results can be found in Table 6.
Table 6 Performance metrics of the proposed method
From Table 6, it can be observed that the proposed method performs well on both datasets. On the IoT Sentinel dataset, recall, precision, accuracy, and F1 score all exceed 91%, indicating a high level of classification accuracy. Similarly, on the UNSW dataset, these metrics all exceed 98%. It's worth noting that the performance of the method on the UNSW dataset is significantly higher than that on the IoT Sentinel dataset. This disparity can be attributed to the composition of the datasets. The IoT Sentinel dataset contains similar devices from the same manufacturer with similar purposes, such as TP-LinkPlugHS110 and TP-LinkPlugHS100, making it challenging to completely distinguish these devices based solely on the network traffic during their initialization processes.
4.3.2 Comparison with existing methods
To evaluate the feasibility and effectiveness of the proposed method, a comparison was made with previously published literature [12], [13], and [19]. The comparison results can be seen in Table 7 and Table 8
Table 7 displays the various indicators of device category identification by different methods. Upon examination of Table 7, it is evident that the proposed method performs well on both datasets, achieving outstanding results with all indicators reaching 99%. Notably, the other three methods achieved relatively high identification results on both datasets, with accuracy rates exceeding 92%. Table 8 shows the various indicators of device type identification by different methods. Upon reviewing Table 8, it can be seen that the proposed method performs well on both datasets, especially on the UNSW dataset, where all indicators exceed 98%. Similarly, the other three methods also performed well on the UNSW dataset, with accuracy rates exceeding 91%. In summary, when compared with other methods, the proposed method consistently outperforms on different datasets, achieving superior identification results. These findings underscore the feasibility and effectiveness of the proposed approach for IoT device identification and classification.