IoT-based Machinery Failure Predictive Solution using Big Data Analysis on the MIMII Dataset

One of the main focuses of smart industry is machinery failure predictive solutions. To achieve this, IoT-based solutions have been widely deployed. However, data processing and decision making remain challenging. The absence of enough knowledge has been the primarily limitation of statistical methods and supervised learning methods. Therefore, unsupervised learning methods are gaining more popularity but still have limits to cover effectively the pre-signs of failures due to the complexity of training process and results visualization. Previously, we proposed a novel Big Data Analysis method on audio/vibration data to cover effectively the pre-signs of failures through data visualization without complex learning or processing. We validated our proposal on a demo system. In the present work, we are using part of the MIMII dataset to test our proposed analysis method on a real-world-like data and verify the validity of our proposal on a more complex system. We are showing that we can detect abnormal machine behaviors and predict failures without prior training or knowledge of the target monitored machine.


Introduction
Every mechanical and electromechanical system is evidently subject to anomalies.Anomalies such as rusting, or simply broken parts are common to most of the machine types.However, every system has its specific anomalies: misalignment for a gear system, contamination for a valve, leaking for a pump or clogging for a fan.No matter what is the type of failure or its causing factor, if not detected in time and maintained properly, the machine would eventually fail.Thus, an effective maintenance technique is mandatory.Unfortunately, certain abnormalities still defeat monitoring systems and have catastrophic impact in terms of safety as well as cost.
Several methods have been developed, whether to correct anomalies when they happen, to prevent them by scheduling machine maintenance or to try to be a step ahead and predict them.As a matter of fact, predictive solutions seemed the most effective maintenance technique in terms of safety and cost.Condition-Based Monitoring (CBM) is considered as one of the most commonly used predictive maintenance solutions.This approach relies on the monitoring of the machine and is based on three key components [1]: data acquisition, data processing and decision making.With the progress we are witnessing today in sensor modules, wireless communication techniques, data storage solutions and Big Data Analysis methods, IoTbased solutions are massively developed.Fulfilling the three key components of a CBM approach, IoT-based predictive solution seems to be an effective application of a CBM sensor-data-based method.
Machines have been known to be monitored via the acquisition of certain sensor data: voltage and current [2], temperature and pressure [3], vibration [4,5,6,7,8,9,10] and sound [11,12,13,14,15,16,17].Vibration and sound have been reported effective sensor signals to characterize a machine behavior.A machine, even operating properly would vibrate and generate a certain sound.In case of abnormality, the machine frequency response would shift, and signal amplitudes would change accordingly.Therefore, with the outstanding progress in sensor technologies, we are able to achieve high-precision data acquisition to track the machine behavior.However, data processing and decision-making phase are still a challenge for IoT-based failure predictive solutions since sensor data analysis remains non-efficient for an accurate failure prediction.

o view linked
The earliest algorithms for sensor data analysis were based on statistical approaches in the purpose of detecting abnormalities.Those methods were primarily applicable to one-dimensional data [18] and gave two possible predictions: poor or healthy condition.This left infinite possibilities in the grey zone.Therefore, it resulted in a high rate of fake-alarms or overlooking abnormalities which made it a poor candidate for machinery failure prediction.And despite the efforts to extend the statistical approaches to multidimensional data, those methods suffered from "The Curse of Dimensionality" [19].Learning methods dealt better with the multidimensional space by working on having effective feature extractor which led to better results in anomaly detection.However, in a real-world industrial environment, an anomaly is rare to happen during a short period of time.Moreover, it is impossible to cover all possible anomalies or failure scenarios.Therefore, it is difficult to construct an effective training dataset.This presents the main limitation of applying supervised learning method for machine failure prediction despite their success for audio data classification and identification such as for Detection and Classification of Acoustic Scenes and Events challenge (DCASE).Thus, unsupervised methods gained more popularity and is being more commonly used for machine failure detection.
In our previous paper [11], we proposed a Big Data Analysis method that could be applied on sound or vibration data collected from machines to detect anomalies and predict failures without prior training.Our method is intended to detect any type of anomaly: experienced and not-experienced anomaly.Our proposal permits to visualize signs of failure as soon as the machine starts to have degradation.Such degradation signs could be tracked and detected since day 1 without the need of enormous training dataset or complex learning.In [11], we used a simple experimental environment to validate our hypothesis by mimicking failures on a miniature DC-motor.Due to confidentiality matter and internal data safety measures in factories and plants, it is difficult to validate analysis methods academically with real factory environment data.Moreover, factories tend to have machines that are not supposed to fail for a long period of time.As a matter of fact, in [13], Purohit et al. affirmed that there are no public datasets that focus on the sound of industrial machines under normal and abnormal operating conditions in real factory environments.Purohit et al. then released MIMII dataset [20], a sound dataset for Malfunctioning Industrial Machine Investigation and Inspection, covering several machines with normal and abnormal behavior.The purpose from the MIMII dataset is to help machine-learning and signalprocessing community for the development of automated facility maintenance [13].
In [11], our hypothesis stated that the core analysis can remain effective for failure detection using vibration or sound data for different systems without knowledge of the monitored machine or prior training.In the present paper, we are using part of the MIMII dataset to test the validity of our core analysis on a real factory environment recorded audio data.We aim to (1) verify the possibility of differentiating abnormalities from normal operation in a complex monitored machine.Furthermore, we aim to (2) test our predictive scheme to detect failures by using the normal data only as reference data.Therefore, we chose to work on the valve data based on the comment of the MIMII authors in [13] on the difficulty of detecting anomalies for valve due to the non-stationary nature of the sound signal.
The rest of the paper is organized as follows.We start by briefly reviewing the different data analysis methods used for vibration-based and audio-based failure predictive systems.Next, we present the outline of our failure-predictive solution.Then, we apply our predictive scheme on part of the MIMII dataset to test the possibility of detecting abnormalities using normal data only as reference data.Additionally, we run our analysis scheme on the normal and abnormal data combined and verify the validity of our prediction results.Finally, we discuss further our results and our future directions and conclude the present work.

Related works
This paper is an application of our proposed data analysis method on machine anomaly detection.In [15], Koizumi et al. narrowed down the difference between the two categories to the definition of anomalies.Supervised methods are for detecting "defined" anomalies while unsupervised methods tackle the detection of "unknown" anomalies.According to such classification, our method falls into the category of unsupervised-based methods for anomaly detection.
In this section we are reviewing briefly some of the related works (supervised and unsupervised methods) that used Big Data Analysis methods for machinery anomaly detection based on sound or/and vibration data.Moreover, we will review what, to the best of our knowledge, has been developed and applied on the MIMII dataset.
In the recent years, classification-based methods are most commonly used.According to [21], those methods learn a model (classifier) from a set of labeled data (training) and then, classify a test instance into one of the classes using the learned model (testing).
Several approaches such as Neural Networks (NN) and Support Vector Machines (SVM) have been deployed for machine anomaly detection.In [4], Windau and Itti proposed Inertial Machine Monitoring System (IMMS).This work tested the use of statistical features and the use of vibration frequency diagrams with SVM and NN to classify normal operation vs. 10 types of real-world abnormal equipment behavior.In [5], Kanlar et al. led a comparative study on the use of an Artificial Neural Network (ANN) and SVM on statistical features from vibration data for fault diagnosis of ball bearings.In [6], Zhang et al. proposed a method for fault classification and prediction of degradation of components and machines using vibration sensors.Their method was based on the frequency-domain data extracted features to train ANN to estimate the machine Remaining Useful Life (RUL).
However, recently Deep Neural Networks (DNN) methods are being more commonly used for classification mostly based on Autoencoder (AE) and Variational Autoencoder (VAE).In [8], Galloway et al. used AE on spectrograms from raw vibration data of a tidal turbine in comparison to statistically extracted features used with SVM.In [14], Koizumi et al. used statistical approach by using Gaussian Mixture Model (GMM) for anomaly score estimation with proposing a method to optimize a DNN-based feature extractor for anomalous sound detection using VAE and an objective function based on Neyman-Pearson Lemma applied on machine audio data.While in [15], Koizumi et al. proposed an end-to-end training using an AE for both feature-extractor and normal model training by using Neyman-Pearson Lemma as objective function.
In [16] Kawaguchi and Endo used an end-to-end Long Short-Term Memory (LSTM) autoencoder on subsampled signal using audio data for anomaly detection.In [17] Oh and Yun used autoencoder on sound data from a Surface-Mounted Device machine.And specifically, to the best of our knowledge, only Purohit et al. in [13], used an unsupervised learning-based method on the MIMII dataset.The authors used an AE on log-Mel spectrum as extracted feature from the audio frequency-domain data.
Despite the remarkable achievements of DNN for machine anomaly detection and failure prediction, this category of methods remains complex and unexplainable due to their nested non-linear structure [22].And therefore, seen as the Black Box.In [23], Jardine et al. mentioned that there are the two main difficulties with neural networks: the difficulty to obtain physical explanation of the trained model and the difficulty of the training process.To apply those methods, factories then need to hire data scientists at a high cost, and this can be complicated in some situations requiring extra data privacy procedures.Those facts present the main limitation for bringing such technologies from research and academia to industrial world.
Clustering-based methods is another used category.Initially, in a multidimensional space, the abnormal data is different from normal data.A clustering-based method, or a mapping tool, that projects properly the data from multidimensional space into a lower dimensional space permits to distinguish the abnormal data from normal data based on the data distribution.Therefore, with this methods category, it is possible to visualize the data in a two-dimensional space and track signs of failure.Several methods have been proposed and proved to be successful, such as t-SNE [24] and UMAP [25].toor Inc. proposed toorPIA [26], a novel clustering-based method.toorPIA uses hierarchical clustering and is an already existing commercial product [26].This method has been showing good classification capabilities and high-speed processing in addition to be easy-to-use.
Our proposed analysis method is based on a core analysis followed by the use of a mapping tool.Among the existing mapping tools, we chose to use toorPIA for previously showing valid results on sound data [11].The core analysis relies on the estimation of the correlation between data instances the bare sound or vibration frequency-domain data to monitor the machine condition rather than complex models or selected statistical features.Therefore, our proposal maintains the simplicity of the process while achieving accurate diagnosis and predictions.

Outline of the proposed failure predictive solution
We are proposing an IoT-based machinery failure predictive solution as a CBM predictive maintenance technique.Hence, our solution consists of the three phases [1]: data acquisition, data processing and decision making.In a prior work [11], we presented the system architecture of our solution.Our proposal has the common IoT-based system architecture as described in Fig. 1.The data is collected from one or multiple sensor nodes and transmitted to a cloud storage via a wireless network.Further data processing is then performed on the sever using our proposed Big Data Analysis method for diagnosis and prediction.

The proposed big data analysis method
We are proposing a big-data-based analysis method that would be applied on the retrieved sensor data.Our method consists of a pre-processing and a mapping phase.
The pre-processing phase consists of generating data vectors from the time-domain data, extracting the frequency contents of those vectors using Fast Fourier Transform (FFT) and finally, from the spectra data, generating a similarity (distance) matrix to be the input of the mapping phase.
The mapping phase consists in visualizing the multidimensional data in a two-dimensional space.As we mentioned in section Related works, several methods have been proposed and implemented such as t-SNE [24], UMAP [25] and toorPIA [26].Mapping methods tend to preserve the multidimensional distances to a maximum degree.In the present work, we chose to use toorPIA as a mapping tool.For toorPIA, in a multidimensional space, correlation acts as gravity and brings together the data of high similarity into clusters.Therefore, using the similarity matrix, toorPIA assigns a single (x i ,x j ) coordinates pair for each data vector F i expressing the relative position of the data vectors in a 2D map.The process flow of our analysis method is illustrated by Fig. 2.

The proposed failure predictive scheme
The 2D map generated by toorPIA can be used for early detection of failure signs of the machine based on the prediction scheme detailed in Fig. 3.
First, we generate a 2D map from the accumulated data that we refer to as a reference map.Since in most cases, machines are designed to run in a healthy condition for a long period of time, the reference map is generally consisting of a normal state single cluster or several clusters.Therefore, from day 1 data, we can form a reference map.Any data acquired later can be mapped into the reference map.
Then, from the position of this newly plotted data point in the reference map we can know the present condition of the machine.If the machine is maintaining a normal operating condition, the newly plotted data points will fall into the normal condition cluster.However, if the machine started to gradually deteriorate, the newly data points will be plotted on the edge then move outside of the normal condition cluster.In the latter scenario, if the machine is in a serious pre-failure condition, the newly plotted data will shift entirely from the normal condition cluster and will form a new well-defined cluster.Then a maintenance would be urgently required.If the newly plotted data has shifted but then came back into the normal condition cluster, the machine might have had some disturbances but there is no risk of failure.Therefore, there is no need to have enormous maintenance actions.
We believe that this prediction scheme is able to track the slight change in the data as the machine drifts from a normal operating condition to degradation.We can detect whether or not a failure is about to happen.
Moreover, the reference map can expand with all newly acquired data according to the machine condition.New clusters specific to each type of failure will appear gradually.Thus, instead of having a complex training prior detection, the intelligence of the method grows daily with the growth of the map.
In this paper, we are focusing on the application of the proposed big data analysis scheme as predictive tool on the already available dataset provided by [20].

Application of the predictive scheme on the valve data of the MIMII dataset
In the current work, we selected from [20] the data of valve 00, on channel 1 and with Signal to Noise Ratio (SNR) equal to 0dB.The dataset comes initially in form of 10-second-long segments rather than continuous time-domain data.We have 991 normal data segments and 119 abnormal data segments.As described in the previous section, the first step in our predictive scheme is to form a reference map and then add the test data on it.We chose to use the normal data only as reference data and the abnormal data as test data.

Analysis of normal data and generation of the reference map Analysis
We start by analyzing the normal data only to form the reference map.We kept the vectors length equal to 10 s.This would permit us to cover the fluctuations in the valve sound data and the non-stationary nature of the signals mentioned by the authors in [13].By having a sampling rate of 16 kHz, a vector having 10 seconds as window length would result in a window size of 16.10 4 .This window size is then not a power of 2 and doesn't satisfy the FFT requirement for symmetry.Thus, we chose to complete each vector by 102144 dummy data to reach a window size equal to 2 18 .Each vector is then represented by (1).By adding dummy data, we obtained vector data covering 10-seconds length valve operation and of a window size equal to 2 18 . Where: • n: the original vector window size, 16.10 4 • m: the higher closest power of 2 to n, 2 18 = 262144 • a i,1 , a i,2 , , a i,n : the original vector data covering 10-seconds valve operation • d i,n+1 , d i,n+2 , , d i,m : the dummy data where d i,j = 0 ∀j ∈ [n + 1, m] Next, we applied Hanning window and FFT on the obtained vectors to generate the frequency-domain data.Fig. 4 presents the average spectrum from the total of the normal data.The obtained spectrum is noisy and shows fluctuations.The frequency resolution of a spectrum is the hop between two frequency components.In this present case, with a sampling rate of 16 kHz and spectrum size of 2 17 , the frequency resolution of our spectra is 0.06 Hz.We assume that such value is an extremely high frequency resolution and would decrease the accuracy of the evaluation of the correlation between the spectra.Therefore, we tuned the frequency resolution to 15 Hz.Since, the original spectrum size is 2 17 , we then averaged every 256 consecutive frequency components to obtain a smoother spectrum.Thus, we obtained spectra of size of 2 9 but still cover the 10 s valve operational data.
Fianlly, we ran the correlation analysis on the obtained spectra.Then by using toorPIA, we obtained the reference map as shown by Fig. 5.

Results and discussion
Fig. 5 shows that the normal data distributes in 2 adjacent dense clusters with the rest of the segments scattered around.According to [13], the valves are solenoid valves that are repeatedly opened and closed but with different timing.The authors also highlighted that the valve sound signals are non-stationary and in particular impulsive and sparse in time.
We inspected the time-domain contents of the different areas on the reference map as shown in Fig. 6.We could see that the valve had about 1-second long operational cycle consisting of: open/stay opened/close/stay closed.In some of the data segments, the valve had 9 full cycles (Fig. 6.a) while in other data segments the valve had fewer open/stay opened/close/stay closed events (Fig. 6.d), therefore, fewer operational cycles.Moreover, at several instances and even in normal condition, the valve showed an alternation of amplification then attenuation of the amplitude of the sound signal (Fig. 6.b. and Fig. 6.c).
Therefore, the distribution of the normal data on the reference map as witnessed in Fig. 5 could be explained by the difference in the operational modes of the valve.

Adding abnormal data into the reference map Analysis
In the current work, we developed a plotting algorithm that places the abnormal data on the reference map according to the multidimensional distance to the normal data.In our proposed prediction scheme, the reference map does not change by the newly plotted data.Hence, the position of the new data will permit us to track and detect any abnormalities.
First, we start by generating the test data vectors from the abnormal segments using the same parameters used for the reference data vectors regarding window length and frequency resolution.The vectors are then 10-seconds long and each spectrum has 2 9 frequency components as dimensions and 15 Hz as a spectrum frequency resolution.Next, for each test vector, we evaluate the multidimensional distances to the reference data vectors and then we use our developed plotting algorithm to project the data vector from the multidimensional space into the two-dimensional space.Fig. 7 shows all the abnormal data placed on the reference map.

Results and discussion
The abnormal data was plotted on the edge of the reference map.Only a small portion of the data overlapped with some of the normal data at the edge of the reference map.On the other hand, the rest of the data was plotted outside the reference map.Moreover, the abnormal data distributed over 3 main compact clusters as shown in Fig. 7.
In Fig. 8, we present the frequency contents of the 3 abnormal clusters and compared them to each other and to the frequency contents of the normal data.For this purpose, we generated the average spectrum from all the normal data spectra to obtain a general idea of the frequency distribution of the sound data when the valve is having a normal operation.Additionally, we picked randomly a segment from each of the 3 abnormal clusters and observed the spectrum.
We found that the normal data spectrum showed higher amplitudes in frequency-domain (Fig. 8.a) compared to the abnormal data (Fig. 8.b, Fig. 8.c and Fig. 8.d).Moreover, from Fig. 8.b, Fig. 8.c and Fig. 8.d, we can see that the 3 spectra from the 3 abnormal clusters showed differences in the frequency distribution.Two hypotheses could explain such differences between the abnormal spectra and thus the distribution of abnormal data over 3 clusters on the reference map: • Hypothesis 1: the 3 clusters correspond to 3 types of contamination • Hypothesis 2: same as the normal data, the 3 clusters correspond to different operational routines of the valve, means to different number of open/stay opened/close/stay closed cycles.In the current stage, we still cannot confirm which of the 2 hypotheses dominated the structure of the abnormal data clusters.However, if a further attribute information is provided by the authors of the dataset, we could firmly confirm which of the hypotheses dominated the distribution of the abnormal data on the map.

Analysis of the valve normal and abnormal data combined
In this section, similarly to the previous section, from the MIMII dataset [20], we used the data of valve 00, on channel 1 and with SNR equal to 0dB.However, in this section, we are analyzing the total data (normal and abnormal) to inspect how the data actually distribute.

Analysis
We start by forming the data vectors from the normal data segments and the abnormal segments using the same parameters used in the previous section.The vectors are then 10-seconds long and each spectrum has 2 9 frequency components as dimensions and 15 Hz as a spectrum frequency resolution.Then, we ran our correlation analysis on the total data (normal and abnormal) to estimate the multidimensional distances.Finally, we used toorPIA to obtain the 2D map shown in Fig. 9.

Results and discussion
From Fig. 9, we can see that there was a full distinction between the normal data and the abnormal data.The distribution of the normal data when analyzed with the abnormal data changed compared to when only the normal data was analyzed.On the other hand, abnormal data distributed mainly on 2 clusters.This separation of the 2 sets of abnormal data can be explained by the difference in the contamination modes.Alternatively, the difference between the two clusters can be explained by the number of open/stay opened/close/stay closed operations, same as for the normal data distribution.In the present work, we are still not able to confirm that we can effectively differentiate between the types of contamination.However, we assume that since the valve operation would be affected differently by different contamination, the frequency characteristics of valve operation should be different between the different contamination modes and thus, data would be plotted in different way on reference map.
By comparing the results of the reference map scheme obtained in the previous section to the results of mapping all of the data together, we found that, in the former case, some of the abnormal data overlapped with the normal data, while in the latter case, there was no overlapping between them.This can be explained by the fact that in multidimensional space normal and abnormal data do not overlap to each other.In data mapping phase, in other words, in dimension reduction phase, we tend to preserve the distance between the data to a maximum degree.Therefore, on the map, when mapping all of the data together, normal data would be well separated from the abnormal data.However, in the reference map scheme, we start by first mapping the normal data to form a reference map and then, separately, we map the abnormal data on the reference map.As a result, the newly mapped data would distribute in wider area and some the of data would overlap with normal data.
Therefore, to further improve the accuracy of the reference map scheme, we can periodically update the reference map.In other words, we generate a new reference map that besides the normal data, includes the abnormal data once the associated abnormality mode happened in the actual machine operation.This would expand the reference map to include the normal data and the abnormal data already obtained.Therefore, our failure detection scheme grows to be more intelligent in real circumstances.This feature is called as "Progressive Intelligence" as proposed in our previous work [11].

Discussion
In the current work, by forming a reference map from normal data and plotting new data into it, we could successfully place abnormal data at the edge as well as out of the normal data area.Therefore, using the proposed prediction scheme, we proved that we can track and detect the signs of failures.
Consequently, according to how the abnormal data distributed on the result reference map, we are proposing the following method to automatically detect the signs of failure from newly acquired data.From the already existing reference map, we start by defining the normal region.The abnormal data would move according to the degradation state.Then, as degradation progresses, we can define the center of gravity of the newly plotted abnormal data that moved out from the normal region.We define a threshold by how far this center of gravity is from the normal region.
In our future work, we will implement and test this method to design a filter on a real-world like scenario.In such situation, the reference map would extend as described in section Analysis of the valve normal and abnormal data combined and different degradation modes could be detected accordingly.
In the present work, we used an initial plotting algorithm to add the abnormal data into the reference map based on the multidimensional distances.However, a more sophisticated method is being developed within the features of toorPIA.In our future work, we will use this new feature to improve the accuracy of the projection of the test (abnormal) data from multidimensional space to the reference map.
Additionally, we planning to tune certain key parameters of analysis to study their effect on the output of the analysis scheme.

Conclusions
The present work is an application and a validation of our proposal for failure-predictive solution on MIMII dataset [20] as a real-world like dataset.Based on the obtained results we can conclude that: 1 Despite the complexity and the limited knowledge of the target monitored machine, here the valve, we successfully differentiated between normal and abnormal data without any modification of the core of our analysis scheme.Hence, our hypothesis in [11], that the core analysis can remain effective for failure detection using vibration or sound data for different systems has been validated.2 Using our predictive scheme, detection of signs of failure and abnormal behavior of machines is possible from day 1 and without prior knowledge of the data.3 We couldn't strictly confirm that we distinguished between the modes of the valve contaminations.However, we assume that our detection scheme would differentiate not only the contamination modes but also operational variation if further attribute info would be available from the author of the dataset to validate our results.In the present work, we applied our proposed failurepredictive scheme on a sound data of valve.However, our proposal can be applied to other types of IoT data such as vibration, flow fluctuation, electric current, temperature, chemical contents, etc...   Step 2 Step 3.a Step 3.b Step 4 Step