Fog computing application of cyber-physical models of IoT devices with symbolic approximation algorithms

Smart manufacturing systems based on cloud computing deal with large amounts of data for various IoT devices, resulting in several challenges, including high latency and high bandwidth usage. Since fog computing physically close to IoT devices can alleviate these issues, much attention has recently been focused on this area. Fans are nearly ubiquitous in manufacturing sites for cooling and ventilation purposes. Thereby, we built a fan system with an accelerometer installed and monitored the operating state of the fan. We analyzed time-series data transmitted from the accelerometer. We applied machine learning under streaming data analytics at the fog computing level to create a fan’s cyber-physical model (CPM). This work employed the symbolic approximation algorithm to approximate the time series data as symbols of arbitrary length. We compared the performance of CPMs made with five time-series classification (TSC) algorithms to monitor the state of the fan for anomalies in real time. The CPM made with the BOSS VS algorithm, a symbol approximation algorithm, accurately determined the current state of the fan within a fog computing environment, achieving approximately 98% accuracy at a 95% confidence level. Furthermore, we conducted a posthoc analysis, running statistical rigor tests on experimental data and simulation results. The workflow proposed in this work would be expected to be utilized for various IoT devices in smart manufacturing systems.


Introduction
In cloud computing, smart manufacturing is built with various IoT devices. A managing intelligence observes the states of devices and takes necessary actions immediately to prevent disruption in manufacturing if any problems arise. This intelligence incorporated with domain-specific knowledge can be labeled as a cyber-physical model (CPM) [1]. However, there are some practical difficulties in creating CPM for real-world phenomena. For example, IoT devices are spatially densely installed in a manufacturing environment; thus, lessening device heat is essential in manufacturing. Although a fan is primarily used for cooling or ventilation, a study on cyber-physical modeling of fans has not drawn as much attention as installed devices. If the fan stops working; as a result, the device's temperature may rise and eventually fail to function correctly, causing significant interruptions to the entire manufacturing system. It is reported that a cooling fan is one of the top 10 failing components in electronic products [2]. Therefore, a study on the state of a cooling fan in smart manufacturing would be practically meaningful.
In general, the state of a fan can be monitored by analyzing data sent from sensors alongside devices on the site. The transmitted data from the sensor is Time Series (TS) with a temporal structure, so it requires non-conventional algorithms than those used for typical tabular data. A conventional data analytics algorithm, including machine learning, is suitable for dealing with pooled data, making it inappropriate to apply directly to TS. The pooled data is data that, unlike TS, does not consider the temporal sequence. As a non-conventional algorithm, Time Series Classification (TSC) algorithm [3] has been known as an algorithm that best fits TS for classification because it takes raw TS to keep its temporal structure intact. The role of the TSC algorithm is to compare TS sent from the sensor and classify it by predefined class. In this way, this algorithm helps to get the fan's state in real time. TSC algorithms can be regarded as the core of CPM, determining which of a set of designated classes in TS belongs. The classes for classification here are defined by multiple collections of data for machine learning.
TSC algorithms can be divided into three types according to how to interpret TS [4]. The first is the similarity-based technique, taking raw TS as input. However, this technique has disadvantages, such as relatively long computation time and weak scalability. The second type is conventional machine learning, disregarding the temporal structure of TS. However, in so doing, most of the information related to the time domain may be lost. A third type, the symbolic approximation technique, has recently drawn much attention, in which raw TS is converted into symbols of arbitrary length. This type's advantage is the dramatic reduction of high dimensionality and noise. The BOSS algorithm has been recognized as a state-of-the-art [5], which was adopted for creating the final CPM in this work. This algorithm runs a sliding window across each series, discretizes the window to form a word, develops a histogram of word counts over the dictionary, then constructs a classifier of the histograms. Once a TSC algorithm is chosen, the next would be having physical laws incorporated with CPM.
Thereby, in creating CPM, domain-specific knowledge plays a crucial role [6][7][8][9]. The complexity of phenomena occurring around the cooling fan can immediately go beyond our understanding in a real-world situation. Hence, there are barriers to establishing and solving physical laws or governing equations to determine the effect of each unknown element on the state of the fan. There are too many unknowns associated with physical interactions, even for a simple IoT device like the fan. Hence, setting up and directly solving such equations is not easy. We employed machine learning with TSC algorithms, a data-driven modeling approach, to resolve the complicated issues in this work. We set up a cooling fan system with an accelerometer installed for collecting experimental data. The physical elements related to the state of the fan can be acoustic noise, wind speed, rotation, vibration, counter-wind, and pressure change, as shown in Fig. 1. In order to set up data-driven modeling, devices for data generation are required, such as sensors. The larger the number of sensors mounted, the more accurate the classification model is. However, as the number of transmitted data increases and the uncertainties grow, making a lightweight CPM work efficiently over limited computing resources makes it challenging.
Upon considering the implementation of intelligence [10] onto a less powerful computing environment, CPM Fig. 1 A cyber-physical model is expected to sufficiently explain the effect of real-world elements such as acoustic noise, mechanical failure, temperate change, revolution (rpm), the pressure distribution of blades, vibration, counter-wind occurrence, and wind velocity, etc must be sufficiently adequate and lightweight compared to that in cloud computing. Fog computing refers to a sublayer of an IoT system shown in Fig. 2. By CPM running in fog computing, we may expect IoT devices' operating states to be fully aware without relying heavily on cloud computing [11]. In cloud computing, efforts are being made to ensure that heavy loads are efficiently distributed as various applications require immediate attention in real-time [12,13]. Thus, much research in recent years has been focused on smart manufacturing in fog computing [14,15].
Several advantages of CPM created in this work can be addressed: Firstly, it can dramatically reduce the amount of data directly transmitted from the sensor to cloud computing. In other words, an intelligent CPM close to the sensor can self-determine the device's state and notify the administrator only at critical moments. Secondly, the types of sensors and data are as diverse and heterogeneous as the devices used in smart manufacturing. For example, the dispatch cycle of data sent by sensors is generally not the same. While there are sensors that send data more frequently, sensors send at considerable time intervals. Therefore, if intelligence can be materialized to account for this time interval, the problem of inconsistency over the data transfer cycle can also be solved spontaneously. This discrepancy is also directly related to latency and bandwidth in cloud computing. Therefore, intelligence operating on fog computing is vital, which can alleviate the heavy burden on cloud computing.
We demonstrated how to create an intelligent CPM for the cooling fan that worked with great accuracy, demanding the least amount of computing resources. As a result, we learned that in fog computing, which has a minimal computing resource compared to cloud computing, CPM could be used to ascertain the state of the fan with the high performance considering possible uncertainties in real-world situations. Therefore, the workflow proposed in this work may increase overall production efficiency by expanding the utilization of fog computing in smart manufacturing.
The paper is organized as follows: The Related work section discusses the related work. A detailed explanation of how to set up CPM, closely following Bayes' rule, is shown in CPM set up with Bayes' rule section. The Symbolic Fourier Approximation (SFA) section explains the symbolic approximation algorithm with sub-algorithms. Bag-of-SFA-Symbols VS section presents the BOSS algorithm and an extended BOSS VS algorithm. Results and discussion section explains the experimental implementation of the fan with the sensor, the exploratory data analysis, and multiple comparisons of CPM types with different algorithms. Lastly, the Conclusion section concludes this paper and discusses the further study.

Related work
This study aims to create a cyber-physical model (CPM) [16] that can accurately determine the operating state of the cooling fan as an IoT device in fog computing. Time series classification (TSC) algorithms that can correctly interpret the time series (TS) transmitted from the accelerometer attached to the fan are required. Domainspecific knowledge, which can describe the fan's characteristics, is also essential. A time-series T i is defined as an ordered sequence T i = {t 1 , . . . , t n } . The univariate TS has only one feature ( d = 1 ); on the other hand, the multivariate TS has several features ( d > 1 ), that is ∀j, t j ∈ R d , where d is the number of features. Fig. 2 Three levels of IoT system. As the top-level, IoT employing a cloud system associated with big data analytics. Fog computing resides in the middle, equipped with fast streaming data analytics. At the bottom level, IoT devices such as sensors are located with consecutive temporal data requiring real-time data analytics Although research on TSC has drawn much attention for years, most results are related to accuracy improvement and not scalability. However, recent IoT sensordriven applications have to deal with precisely these scaled data in classification, making methods unimportant that do not scale [17,18]. In particular, TSC working in fog computing has to meet stringent limits over the usage of resources. Therefore, TSC algorithms should be lightweight, fast, and accurate in a fog computing context. Next, the three most widely used TSC algorithms are discussed.
Firstly, One-Nearest-Neighbor and Dynamic Time Warping (1-NN DTW) [19] is a TSC algorithm that defines the similarity among multiple TS by calculating the distance of each element of TS for comparison. The principle of the algorithm is that the farther away the two TS data are in distance, the less alike they are. Thus, from the perspective of TSC, it is easy to determine to which class a given TS belongs. The algorithm has been used for years because it is straightforward and intuitive. Moreover, this algorithm is still used as a baseline to validate the performance of newly proposed algorithms; however, as the number of data increases, the amount of computation and the memory required grow, making it challenging to utilize in fog computing.
The second type of TSC includes machine learning techniques such as KNN [20], artificial neural networks (ANN) [21], and Long Short-Term Memory (LSTM) [22]. However, those algorithms may not be considered genuine TSC from a strict perspective because they do not consider TS's temporal structure. Therefore, valuable information on the time domain might be lost. In addition, this algorithm is commonly called a black box type because it is intricate to understand the process by which the results are derived.
Thirdly, the symbolic approximation algorithm has attracted much attention recently. The algorithm has brought a breakthrough in dealing with large amounts of TS. The fundamental principle of the algorithm is to divide continuous TS into appropriate intervals and replace it with symbols of arbitrary length. Bag-Of-SFA-Symbols (BOSS) [23,24], Symbolic Aggregate approximation (SAX) [25,26], and ExtrAction for time SEries cLassification (WEASEL) [27] have been most famous. WEASEL builds on the bag-of-patterns (BOP) approach. It activates a sliding window over TS and extracts features in each window fed into TSC on machine learning training. Still, we found that the BOP method tends to take longer in computation than the BOSS algorithm [28]. SAX is a symbolic approximation method used in many applications. In SAX, Piecewise Aggregate Approximation (PAA) is utilized to discretize the mean value of each segment. However, this approach considers only the mean value of each segment; thereby, it often may fail to distinguish different time series with similar mean values, resulting in relatively less accurate classification. Furthermore, it has the disadvantage of constantly recalculating means for new data. The BOSS algorithm is based on Symbolic Fourier Approximation (SFA) [29], and SFA consists of Discrete Fourier Transform (DFT) for approximation and Multiple Coefficient Binning (MCB) for quantization. BOSS is an algorithm running a sliding window across each series, discretizing the window to form a word, developing a histogram of word counts over the dictionary, then constructing a classifier in the form of the histograms.
As for the subject of TSC, two contributions, which we made through this work, can be presented as follows: The first is about extending algorithms into ones that can deal with multivariate TS. Most of the previously published studies on TSC relate to the univariate TS. However, TS transmitted from IoT devices is more likely multivariate TS [30]. For example, this work transmitted multivariate TS data from a multi-channel acceleration sensor mounted on the fan. TSC handling the multivariate TS is much more complex than that for the univariate TS because they also need to find correlations between multiple TS data at each time element. We employed BOSS, WEASEL, and other algorithms to cope with the issue and modified algorithms to address multivariate TS. Eventually, the BOSS algorithm, recognized as state of the art in the field, resulted in the best CPM for accuracy and scalability in this work. The second is of use of real-world data mixed with noise in experiments. In papers published on TSC algorithms [31,32], most results were obtained using publicly available data in the UCR machine learning repository [33]. In order to compare and analyze the various algorithms, it is common to verify the results using well-known data. However, there is an issue that domain-specific knowledge, which can account for real-world phenomena, may not be directly applicable to CPM because using the data provided by the UCR repository is not directly experimented with by researchers. Thereby, it is believed that using data with much noise obtained in real-world situations might yield somewhat different results, which is also pointed out in literature as a concerning issue [34]. Therefore, we established an experimental system in this study and considered real-world data that might be observed in actual manufacturing instead.
CPM should be created for a comprehensive account of real-world phenomena using as much information as possible about the fan, commonly referred to as domainspecific knowledge. The traditional way is to obtain information through experiments, and numerical simulations [35]. Computer simulations of a fan performance require physical laws or government requirements related to fluid dynamics. Coggiola et al. [36] employed the computational fluid dynamics that interprets the flow equations: Steady and Unsteady Reynolds-averaged Navier-Stokes equations [37]. However, since these equations are given in partial differential equations with many variables for inputs, it is not easy to solve even numerically and require considerable computing resources [38]. In addition, multiple-input variables are needed to solve the equations, for example, a fan's velocity, pressure, temperature, and humidity. For these input variables, sensors capable of measuring corresponding physical elements are required. In other words, different sensors for flow velocity, pressure change, temperature variance, and humidity measurement are needed, respectively. Furthermore, as the number of sensors increases, the creation of CPM can become more involved.
Thus, efforts to create CPM recently have received significant attention by actively exploiting data from sensors. For example, to comprehend the effect of the vibration of a fan, another physical law that can relate the vibration and flow through a fan is required, which may not be known to date. Consequently, traditional methods have encountered many complications that cannot be quickly answered. Oh et al. [39] monitored the state of a cooling fan by applying three parameters: acoustic noise mission, shaft rotational speed, and current. However, it requires at least three different sensors. For better accuracy of models, other parameters need to be sought that can affect the state of the fan, and it might take quite some time to find out. Jin et al. [40] used vibration signals to determine the fan's state and find similarity by calculating Mahalanobis distance (MD) for TS data to compare with a baseline. However, this method has the disadvantage of being computationally intensive and weak to scalability as a TSC, similar to the 1-NN DTW introduced earlier. Once a data-driven modeling approach is employed rather than the conventional one, domain-specific knowledge must be assimilated into CPM somehow. Practically, domain-specific knowledge could be implemented by determining the type or number of sensors mounted on the fan. Therefore, machine learning techniques can be utilized to analyze data to find essential features. Consequently, much domain-specific knowledge is needed when building a model. This study installed an accelerometer that detects the fan's movement to identify the fan's state. Thus, some domain-specific knowledge may come from the accelerometer. The next task is how we attained a lightweight CPM working in fog computing.
As the type and number of connected devices increases, cloud computing becomes more challenging to manage. Such problems include latency, bandwidths, and scalability [41]. If the latency between cloud computing and the device is high, it is not easy to cope with real-time events. In addition, specific IoT devices often transmit more data than others, which poses a significant challenge in cloud computing bandwidth. Moreover, scalability issues cannot be overlooked because large amounts of data must be processed at a time. Thus, it would be more convenient in many ways to implement intelligence in fog computing near devices so that we can be fully aware of the situation without relying heavily on cloud computing. Accordingly, much research in recent years has been focused on smart manufacturing applications in fog computing: IoT deployment in fog computing [42], IoT data scheduling in fog computing [43], IoT task scheduling in fog computing [44], and IoT tasks offloading in edge-cloud environments [45]. However, those research is not for specific real-world systems. In this study, taking advantage of machine learning with TSC algorithms, we could develop an efficient and lightweight CPM of the fan system with minimal sensors installed. We demonstrated that this model worked smoothly in fog computing.
We obtained two promising results in this work: The first was reducing the high dimensionality of incoming TS from the sensor using the BOSS algorithm. Second, the CPM with the BOSS algorithm acted as a low-pass filter, diminishing the noise significantly. An intelligent system for the operating state of the cooling fan was explained in detail. We demonstrated that such intelligence as CPM works smoothly in fog computing to accurately determine the device's state by analyzing the data transmitted through the IoT device or sensor in real time. Next, a detailed discussion on setting up CPM, closely following Bayes' rule, is presented.

CPM set up with Bayes' rule
In this study, the CPM was sought to classify the fan's state (or class) solely based on the sensor data. The class set C = {C 1 , · · · , C K } with K = 3 classes that are normal, counter-wind, and mechanical failure states of the fan. The CPM can be expressed as a function f : T → C , where T = {T 1 , . . . , T N } , and in probabilistic context, it can be expressed as a joint probability distribution p(C, T) . However, finding the mutual relationship between C and T in an actual situation to complete the joint probability is not straightforward. For example, after observing only the TS transmitted from the IoT sensor installed in the fan, it is challenging to determine the state of the fan at once. Therefore, to alleviate this difficulty, we employed a technique of reducing the unknowns using a marginalization based on the Bayes' rule [46], which is handy to describe whole processes here logically. With the rule, the maximization over a class C on the posterior p(C|T) is equivalent to the function f (T): The denominator of Eq. (1) is not related to class C, as a result, the CPM function f (T) can be approximated into the multiplication of the likelihood p(T|C) and the prior p(C) . This function is internally composed of histograms for symbols, so it retains the nature of probability, a fascinating result. Because most machine learning models learned from the data are given in the form of linear or nonlinear algebraic equations, not a probability. If the classification results of a CPM are presented as probabilities, we could interpret the result in a probabilistic way; thus, the higher the probability, the more probable the state of the fan becomes. In addition, we compared the classification results for the five different CPMs by performing rigorous statistical tests of accuracy and scalability.
Once the CPM is complete, labeling a new TS data Q = {q 1 , · · · , q m } and ∀i , q i ∈ R d for i = 1 . . . m with d being the feature can be performed shown in Eq. (2). (1) That is, label(Q) is a formula for the classification process. It is noted that p(C|Q) can be interpreted as the entity with newly arrived data Q plugged into p(C|T) . Internally, this process is implemented in vector multiplication, and mathematical details of the BOSS algorithm will be explained in the following sections.
An entire procedure of creating a CPM here is schematically diagrammed in Fig. 3, divided into three phases: observation, machine learning, and status update. In the observation phase, streaming data is stored for 3T sec. Once it passes 3T sec, Machine Learning trains the model for 2T sec. It should be noted that even while learning the model, the new data is still coming in to be stored. After the training is finished, a CPM conducts classification over new data for T sec. It is important to note that Machine Learning and Status Update must complete within 3T sec; otherwise, this process can be out of work.

Symbolic Fourier Approximation (SFA)
This section introduces Symbolic Fourier Approximation (SFA) used in the BOSS algorithm. SFA consists of Discrete Fourier Transformation (DFT) and Multiple Coefficient Binning (MCB).
size c, the breakpoints β j (0) < · · · < β j (c) for each column M j are generated. Each bin is labeled by applying the ath symbol of the alphabet A l to it. For all combination of (j, a) with j = 1, . . . , l and a = 1, . . . , c , the labeling symbol(a) for M j can be done by It is noted that this process in Eq. (5) applies to all training data.  Fig. 5b. Next, MCB is conducted with an alphabet set A 1 = {aa, ab, ba, bb} as shown in Fig. 5c. Thereby, an SFA word of the first sample is mapped into a word ab shown in Fig. 5d. Likewise, the other samples can be transformed into their respective SFA words.

Bag-of-SFA-Symbols VS
The Bag-Of-SFA-Symbols VS (BOSS VS) represents the time series representation with the structure-based representation of the bag-of-words model. The sequence of SFA words for six samples in Fig. 5d reads as follows: The values that count the appearance of SFA words in Eq. i = 1, 2, . . . , (n − w + 1) , where A is the SFA word space and l ∈ N is the SFA word length. The BOSS histogram B(S) : A l → N . The number in the histogram is the count of appearance of an SFA word within T upon numerosity reduction. BOSS VS model allows frequent updates, such as fast streaming data analytics. As shown in Fig. 6a and b, the BOSS VS model operates sliding windows unto each time series resulting in multiple windowed subsequences. Next, each subsequence is transformed into the SFA words shown in Fig. 6c. All of the subsequences eventually result in the BOSS histogram shown in Fig. 6d. However, since the BOSS histogram itself is not suitable for performing multiple matrix calculations, it is vectorized through Term Frequency Inverse Document Frequency (TF-IDF) algorithm shown in Fig. 6e.

Term frequency inverse document frequency
The BOSS VS model employs Term Frequency Inverse Document Frequency (TF-IDF) algorithm to weight each term frequency in the vector. This assigns a higher weight to signify words of a class. The term frequency tf for SFA words S of a time series T within class C is defined as where B(S) is the BOSS histogram in Eq. (7). The inverse document frequency idf is given by In this study, for classification purposes, we employed three different states of a running fan, which is presented as a set of classes (states) C . The elements of the set are C = {C 1 , C 2 , C 3 } . It is noted that each element C k for k = 1, 2, 3 represents a certain state of the fan. As for the human-readable format, we have assigned name-tags to each class such as C 1 = "Normal ′′ , C 2 = "Counter Wind ′′ , and C 3 = "Mechanical Failure ′′ , respectively. The inverse document frequency indicates the frequency of an SFA word in a class C k . Therefore, in this study, the numerator (8) The result of tfidf(S, C) on three states is displayed in Fig. 6e. It is noted that a high weight in Eq. (10) is obtained by a high term frequency in the given class.

Classification
Classification of new data Q can be carried out using the cosine similarity metric CosSim: It is noted that in Eq. (11) tf(S, Q) is of the term frequency of Q as shown in Fig. 7b, which is the BOSS (10) histogram of Q . Then, CosSim(Q, C) is calculated in Eq. (11). Upon maximizing the cosine similarity, a query Q is thus classified into the class C k as shown in Eq. (12): In conclusion, the BOSS VS algorithm comprises two notions: Bag-of-words and TF-IDF. What makes the BOSS VS different from other algorithms is a way of taking features of data. This algorithm does not construct a loss function like other machine learning algorithms but uses Bag-of-Words instead. With time series being transformed into sequences of symbols, Bag-of-words approaches are then used to extract features from these sequences. Time series are presented as histograms with designated symbols. And then, each histogram is transformed into TF-IDF vectors for classification. We have discussed building a model with quite a complexity; thereby, we sorted out the procedure step by step with a lookup

Experiments
The experimental apparatus is a three-blade fan on the wheels with a low-power digital accelerometer made in Analog Device (ADXL345 GY-80) as shown in Fig. 8. The dimension of the apparatus the width of 18.5 mm, length of 12.3 mm, and height of 30 mm. We considered three of the most probable states of the fan we can think of in a real-world situation for classification. The normal state where the fan runs without any noticeable event (see left pane in Fig. 8). The counter-wind state occurs when the counter-wind intermittency against the fan (see center pane in Fig. 8). The mechanical failure state where one of the blades is broken off (see right pane in Fig. 8). The average rotational speed of the fan was 114 rpm in the normal state, 41 rpm in the counter-wind state, and 107 rpm in the mechanical-failure state, respectively. Each data set was collected at a sampling rate of 100 Hz for 3 seconds from the accelerometer. For example, we built 2,700 data sets (900 data sets for each fan state). Thus, we made balanced data, and each state has the same amount of data. For example, 900 samples for each state of the fan were collected via x and y channels, so the number of data points sums 900 samples × 2 channels × 300 × 3 states = 1, 620, 000 . It took 2 hours and 15 minutes to collect 1,620,000 data points at each measurement. The experiment data sets are shown as a set of time series along with mean and standard deviation into three states in Fig. 9. For the learning model, we divided the whole data into training and test data with a ratio of 9:1. That is, 270 data sets out of 2,700 were assigned to test data. We performed 10-fold cross-validation over five scenarios, respectively. The cross-validation results are shown in Table 6. For comparison, we listed two different accuracies: non-cross-validation and cross-validation. In scenario II, where 900 data sets are used for machine learning, one of five models, the 1-NN DTW, crashed during the simulations. Such an issue may be caused by limited computing resources in the fog-computing facility. Thus, the cross-validation over the 1-NN DTW model could not go forward for larger data sets (listed as N/A in Table 6).
In this study, the operating state of the fan is classified into three categories: normal state, counter-wind state, and mechanical failure state. In Fog computing, CPM is learned by machine learning using data from acceleration sensors and five different TSCs and analyzing the newly input data to distinguish the state of the fan. Fog computing in this study was Intel Core i3-8100 CPU 3.60 GHz at 4 GB memory on Ubuntu 20.04.1 LTS.

Exploratory data analysis
Exploratory Data Analysis (EDA) is carried out to identify data property. In Fig. 9, the raw data, its rolling mean, and the standard deviation are overlaid. Since raw data contains much noise, it is necessary to filter it out for a better analysis. The rolling mean is one such filtering tool. The standard deviation can be used for estimating the variance of data. In addition, we need to know how many trends, repetition over intervals, white noise, and other uncertainty are. We should never take these data characteristics lightly because they can determine the authenticity of the experiment. We employed the Augmented Dickey-Fuller (ADF) test with the null hypothesis of whether it was stationary. The test results of p-value ≤ 0.05 as shown in Table 2 for the three states with time series from experimental data; therefore, we can reject the null hypothesis at a 95% confidence level. Thus, we may affirm that the experimental data was stationary.

Comparison of models
We employed five different models: WEASEL MUSE, BOSS VS, random forest (RF), logistic regression (LR), and one-nearest neighbor DTW (1-NN DTW). Table 3 describes the characteristics of five models according to temporal structure (Temporal), low-pass filtering (Filter), transformation (Transform), and critical features (Features). Only 1-NN DTW keeps the temporal structure of data, and the others do not consider the order of data points over time. Algorithms for feature extraction are χ 2 test for WEASEL MUSE, Term-Frequency Inverse Document Frequency algorithm for BOSS VS, Entropy for RF, the cost function for LR, and Euclidean distance for 1-NN DTW. Table 4 shows the numerical expression of the trained model p(C|T) in Table 1, which is the result of vector tfidf(S, C) calculated using training data. The symbolic algorithm SFA converts the whole training data to S = {aa, ab, ba, bb} . For example, the features of the normal state (C 1 ) are aa , ab , ba , and bb with the numerical values (3.5649, 3.5649, 3.5649, 3.6390) as displayed in Table 4. For the counter-wind state ( C 2 ), the value reads (3.1972, 2.9459, 3.3025, 3.3025), which is clearly distinguished from those of the normal state. Table 5 shows the classifier p(C|Q) in Table 1 for Q . For example, the first sample Q 1 is predicted as the normal state because of the largest value of 0.9990 throughout the column to which it belongs. In the same fashion, the classification is performed for the remaining time series, such as the counter-wind state for Q 2 , the mechanical failure state for Q 5 , etc.

Post-Hoc analysis
Often in many studies, the results tend to be presented without statistical rigor. However, it is essential to check if it was statistically significant before further discussion, called Post-Hoc analysis. As shown in Table 6, the BOSS VS model indicates the highest accuracy for all data sizes. In addition, even though the number of data is increased from 180 k to 1.6 million, it shows a bit change in accuracy, so we may conclude that the BOSS VS model is not significantly affected by the data size. The smaller the number of data used, the shorter the run time, but on the other hand, the model tends to be overfitted to the data. For example, in scenario I, we used 180,000 data points; for the BOSS VS model, the accuracy turned out to be 100%. This result indicates overfitting, where we used too little data for training. If the number of data is increased, the time for preprocessing and calculation also increases accordingly. Machine learning models tend to suffer from overfitting one way or another for several reasons. The reviewer correctly pointed out that using a proper validation method is essential. Furthermore, the most apparent reason might be lacking data. In machine learning communities, it is well known that too little data for simulation does not bring significant results. In this study, we experienced overfitting when only 300 data sets were used for simulation. As we used more data, such overfitting does not appear anymore. Therefore, more data can be a cure for the issue in this particular case. Table 6 does not tell whether the difference in the accuracy of each model is statistically meaningful. Another ambiguity arises in the results of the run time. Thus, we carried out the ANOVA (Analysis of Variance) test that provides statistical significance among differences. Table 7 shows the ANOVA test result for accuracy with F = 60.8 , and p-value ≤ 0.05 at a 95% confidence level. This result indicates that the null hypothesis is rejected. Therefore, we can say that the mean values for the accuracy of each model differ significantly. In addition, the difference in run time for each model is statistically significant, with F = 4.58 , and p-value = 0.008. In conclusion, we can confirm the simulation over five scenarios for accuracy and run time with five statistically significant models.
However, the ANOVA test results shown in Table 7 alone cannot tell which model is different in accuracy and run time from others. Thereby, another test should be carried out to see which model is significantly different from the others. We employed Tukey's Honest Significant Difference test (Tukey Test) for all pairwise comparisons while controlling multiple comparisons. In this study, the suitability of models was sought statistically in two aspects: accuracy and scalability.

Accuracy
The Tukey test result, which is multiple comparisons of accuracy from five models, is summarized in Table 8. Two cases, 1-NN DTW vs. WEASEL MUSE and LR vs. RF, are not statistically significant upon a 95% confidence level. This result implies that two pairs may have a remarkable similarity in making poor predictions. On the contrary, all pairwise comparisons with the BOSS VS model are proven statistically significant at a 95% confidence level. Figure 10 shows yet another aspect of the trend of accuracy over run time for all five models, where the BOSS VS model outputs a far outstanding performance in accuracy and run time. As a result,

Scalability
In general, a scalable model shows consistently good performance despite increasing the amount of data. Multiple comparisons of run time for five models are summarized in Table 9. All pairwise cases for the 1-NN DTW model vs. the other models are significant. Thus, we may conclude that the 1-NN DTW model is far less scalable. Figure 10 shows the scalability of models. Except for the 1-NN DTW model, the other models keep relatively small changes in the run-time subject to increasing data size. In Fig. 11, the scalability comparison of five models is shown. As the amount of data increases, the 1-NN-DTW model shows the worst scalability. We observed that the BOSS VS model performs excellent scalability with the best accuracy among the other models. Figure 12 shows the comparison of the 95% confidence interval (CI) of the accuracy of each model using experimental data of different sizes. The accuracy of the BOSS VS model fell into CI = 0.9872 ± 0.0073 , of which statistical behavior is much better compared to CI = 0.8205 ± 0.1319 in the second-place RF model. Moreover, the deviation of 0.0073 of the BOSS VS model is quite small compared to 0.1319 of the RF model. This    result explains good scalability, which indicates that the BOSS VS model is robust to changes in data size.

Conclusion
While most literature on the subject was published using the results using well-preprocessed public data, in this work, we implemented noisy real-world data into the classification models.
A primary goal of this study is to build an excellent cyber-physical model (CPM) of an IoT device with significant classification accuracy in fog computing. To this end, some critical issues must be resolved: one is of time series classification (TSC) algorithms and domainspecific knowledge that can identify the state of an IoT device, and the other is about how to process streaming sensor data in real-time properly. A data-driven modeling approach could alienate quite the burden of such complicated theoretical domain-specific knowledge by using a vast amount of data to take advantage of machine learning and statistical inference. With a large amount of data transmitted in real-time from a networked IoT device, it is significant to correctly classify the device's state, a core of smart manufacturing. Recently, there has been a growing tendency to solve this issue in fog computing close to IoT devices because of the heavy loading on cloud computing.
We studied a three-blade fan with an accelerometer installed for an IoT device to create CPMs that can classify the state of the fan. Using state-of-the-art TSC algorithms, upon the classification performance of five CPMs with real-world data, we achieved an accuracy of about 98% at a 95% confidence level with the BOSS VS algorithm, which resulted in excellent classification and significant size reduction in streaming data.
In this work, we produced a lightweight CPM that works well in fog computing, which can determine the state of the fan, but, in the field, since there are a large number of fans installed, creating models for each fan will