Multifractal detrended fluctuation analysis of insole pressure sensor data to diagnose vestibular system disorders

The vestibular system (VS) is a sensory system that has a vital function in human life by serving to maintain balance. In this study, multifractal detrended fluctuation analysis (MFDFA) is applied to insole pressure sensor data collected from subjects in order to extract features to identify diseases related to VS dysfunction. We use the multifractal spectrum width as the feature to distinguish between healthy and diseased people. It is observed that multifractal behavior is more dominant and thus the spectrum is wider for healthy subjects, where we explain the reason as the long-range correlations of the small and large fluctuations of the time series for this group. We directly process the instantaneous pressure values to extract features in contrast to studies in the literature where gait analysis is based on investigation of gait dynamics (stride time, stance time, etc.) requiring long walking time. Thus, as the main innovation of this work, we detrend the data to give meaningful information even for a relatively short walk. Extracted feature set was input to fundamental classification algorithms where the Support-Vector-Machine (SVM) performed best with an average accuracy of 98.2% for the binary classification as healthy or suffering. This study is a substantial part of a big project where we finally aim to identify the specific VS disease that causes balance disorder and also determine the stage of the disease, if any. Within this scope, the achieved performance gives high motivation to work more deeply on the issue.


Introduction
The vestibular system is a perceptual system with the mission of providing intel to our brain about spatial orientation, head position and motion.Besides, it is also filled with motor functions to preserve balance, to counterbalance head and body while we move and to keep our posture stable [1].Even though there are numerous studies about a number of issues in various branches in the medical field, it is hard to consider detection of vestibular disorders among the prominent areas of interest.The relative scarcity of studies in such an important field of medicine is the main motivation of this study.
To analyze and diagnose balance disorders, various methods have been experimented on different types of data.Among those, human gait dynamics are frequently used to diagnose neurodegenerative diseases (NDDs) such as Parkinson disease (PD), Huntington disease (HD) and amyotrophic lateral sclerosis (ALS).As an example, to diagnose PD, Abdulhay et al. worked on gait dynamics.They filtered the force data obtained from Physionet database (created using 8 force sensors under each foot) and applied peak detection and pulse duration measuring techniques to extract features such as stride time, stance time, swing time and foot strike profile [2].Daliri also used these features and added double support interval to extract even more features.He then applied genetic algorithms to select appropriate features and classified NDDs with the obtained feature set [3]. Sarbaz et al. exploited the semi periodic structure of human gait and its fractal properties to model PD and healthy gait dynamics with sine-circle map method.They managed to distinct two patterns with the model parameter "Ω" [4].Zeng and Wong used radial basis function neural networks to obtain gait patterns and performed classifications with various estimators making use of deterministic learning to diagnose NDDs [5].
We observe that studies on detecting VS dysfunctions causing balance disorder are not carried out as intensively as they are concentrated on NDDs.In this area, the wellknown computerized dynamic posturography (CDP) is still the most widely used technique; however, alternative methods are spreading fast in recent years.Studies in the literature do not only concentrate on the gait analysis, but we also meet different approaches.In this context, Imai et al. analyzed eye rotation vectors of samples to diagnose benign paroxysmal positional vertigo (BPPV) [6].Lang et al. performed 3D motion analysis to evaluate locomotor pattern and body's oscillation during gait under different conditions in VS patients [7].Bergeron et al. analyzed the effect of virtual reality tools on detection of VS disorders from different aspects [8].Sang-I et al. measured the balance performance when responding to visual stimuli in subjects with BPPV [9].Auvinet et al. applied dual task gait analysis to detect gait disorders related to several diseases including VS-based in the elderly people [10].
In our study we decided to extract features related to gait.The main reason behind this decision was that we aimed for wearable sensors to ensure that patients do not have to be tested in a clinical setting and are tracked in their daily lives.In the literature, gait analysis is carried out mainly based on three approaches.These are: Motion capture/image processing, where gait will be analyzed by processing the camera records; using force sensors, either placed under the sole of the subject or by utilizing floor sensors; making use of inertial/motion sensors, placed on appropriate points on the body [11][12][13].The collected data will then be used to extract features based on both the frequency and time domain analysis such as step length, step width, speed, foot ascension from ground etc. [14].Within this frame, Jarchi et al. presented a review on accelerometry based gait analysis [15].Ricciardi et al. applied different machine learning algorithms to classify Parkinsonism using gait analysis' parameters where they used 16 features [16].Sama et al. analyzed the body sensor data to find spatio-temporal properties of human gait [17].Zhao et al. developed a dual channel Long Short-Term Memory (LSTM) based multi-feature extraction method for gait analysis [18].Tao et al. reviewed available wearable sensors and ambulatory gait analysis methods based on various wearable sensors [19].
In fact, the experimental setup and the raw data collected in many of the studies about gait analysis using insole pressure sensors are quite similar with our work.However, as a novelty brought in our study, we used instantaneous pressure data to extract features directly rather than to use these data to calculate specific time intervals (such as stride time, stance time, swing time etc.) associated with the gait.We analyzed the changes in the pressure values on each insole sensor to extract information about the change in the behavior of giving strength on different parts of the foot during walking.
Regarding the feature extraction method, fractal/multifractal time series analysis, MFDFA and detrended fluctuation analysis (DFA) have been applied on different fields/ cases in the literature such as EEG, stock market, wind speed, heart rate dynamics, precipitation amount, medical images, traffic, human gait etc. [20][21][22][23][24][25][26][27][28].Common characteristic of all these signals/data is that they either have fluctuations on a wide range of time scales or their values show broad distributions so that their behavior will be described with a power law.As an application of the fractal time series analysis on gait data, Hausdorff et al. illustrated that human walking and random walking demonstrate severe/long-range correlations [29].They claimed that healthy people have a more compact time series compared to diseased people and they also observed a similar discrepancy between young and elder individuals, which was later also supported by the study of Diosdado et al. [30].Dutta et al. used MFDFA to analyze stride intervals of control, HD and PD groups.They showed that there is a big difference in spectrum widths of diseased and healthy people [20].
In the whole of our project about identifying the specific disease behind the balance disorder sourcing from VS problems, we investigate the contributions of features extracted from data collected by both the inertial and force sensors and we also perform analysis both in frequency and time domain [14,[31][32][33].The main reason behind the choice of time series analysis method and specifically the multifractal analysis in this study is the fact that gait data present multifractal scaling behavior as mentioned above.A careful literature research puts forth that although human gait appears to have regular dynamics, it actually incorporates small and large fluctuations which are long range correlated in healthy case [20,26,29,30].In our study, this feature is expected to provide useful information to capture important features to be used for classification.On the other hand, to extract useful information from individual samples, in other words, to reveal small fluctuations, we need to detrend the data.Therefore, among multifractal analysis we decided on MFDFA as the method to apply.
To summarize the contributions of our work: Quantitatively, studies to determine the diseases based on VS dysfunction are insufficient in the literature.Additionally, almost all studies have focused on similar features of gait analysis, which require a relatively long walking time.This study searches for new features to identify VS related diseases where we aim to increase the accuracy in the detection and also shorten the data acquisition period.In this context, MFDFA has been applied to instantaneous pressure-sensor data collected from wearable insole sensors.This also allows data to be obtained in daily life that helps the patient to escape the stress of the clinical environment and thus avoids adverse effect of the process on the accuracy of the diagnosis.
After feature extraction by applying the MFDFA, a model was trained by the feature set using various classification methods such as SVM, Decision Tree (DT), k-Nearest Neighbor (KNN) etc.To describe the content and flow of the work clearly, the rest of this article is arranged as follows: In Materials & Methods section the MFDFA is described briefly.This section also provides detailed information about the data acquisition process.In the next section we submit the numerical results of the experiments conducted.Finally, we discuss the results and draw conclusions from the study and mention about the future work regarding the use of the outcomes of this study in the whole project.

Applied method: the MFDFA
Briefly, MFDFA consists of five steps.At the first step, the profile of the time series is calculated by subtracting the mean value from each sample and taking the cumulative sum of the samples.As the second step, the new time series or the so-called profile is divided into non-overlapping segments of equal lengths.Next, each segment gets least square fitted and corresponding values of the regression curve are subtracted from the individual values within the segment.Following this step, the variance of each segment is calculated.As the fourth step, qth order fluctuation function is calculated for each segment.The steps two to four are repeated for different q and s (segment size) values.At the final step, log-log plots of the qth order fluctuation function and segment sizes are analyzed; scaling exponent and singularity strength are calculated and the multi-fractal spectrum is drawn.The width of the spectrum demonstrates the amount of multi-fractality of the time series.
Below we summarize the mathematics behind this method which was developed by Kantelhardt et al. in 2002 [34].
Step 1: Determining the profile.
Let us suppose a time series x ( i ) with i = 1 … N ( N : length of the series).Now, the profile is obtained as where x av is the mean of the time series. (1) Step 2: Dividing the profile into segments.
The profile Y(i) of the length N is divided into N s non- overlapping segments which are all equally sized with s.But since in general N will not be an integer multiple of s, the procedure about the division of the time series into segments will also be run backward in order to cover all samples; this is, starting with the last sample of the series and propagate backward toward the first one.Thus, we finally have 2N s segments, each of the size s.
As the next step, the variance of each segment is calculated.The variance is a measure of the power of the tendency to deviate from the mean.For dynamic systems, where we cannot refer to a constant mean value, we are interested in the deviation from the trend, so we need to determine the trend within the segment in order to speak about a meaningful deviation.The trend is determined and a regression curve is obtained by least square fitting and then for each sample, the value on this curve will be subtracted from the corresponding sample value within the segment.This process is called detrending.The variance will then be calculated as In these equations y v (i) is the least square fitted poly- nomial for each segment v .The choice of the degree m of the polynomial is optional and case dependent; in our case we chose m = 2, since the profiles were illustrating piecewise near-parabolic behaviors.We also performed calculations for m = 3 to verify our foresight and obtained similar results as for m = 2.
Step 4: Calculating the fluctuation function.
Multi-fractality in time series is an indicator of the existence of small and large fluctuations in contrast to mono-fractal time series, where 2nd order statistical moments give sufficient information about the series.Thus, for the multifractal case, qth order statistical moments have to be considered where q is not limited with 2 [35]. (2) Here, q can take any real value; for q → 0, Eq. 5 is used since 1/q in Eq. 4 approaches to infinity for this condition [34,35].
where h(q) is the so-called Hurst exponent.
Step 5: This process is repeated for various s and q values, so that the relationship between F q (s) and h(q) is illustrated on log-log plot.If the series is long range correlated, we observe a power law behaviour as [20] It is easy to see that if this scaling exists, ln(F q (s)) will linearly depend on ln(s) with the slope of generalized Hurst exponent h(q).Generally, for monofractal time series h(q) is identical for all q values; in other words, the logarithmic relationship between F q (s) and s is not dependent on q for monofractal time series [36].Since multifractal series cannot be expressed by a unique h(q), a spectrum of generalized Hurst exponents is required to interpret the multifractality of time series.
In order to obtain the spectrum of generalized Hurst exponents or the so-called multifractal spectrum, we go the following steps: First, h(q) is related to the scaling exponent τ(q) via Eq. 6. [29] For mono-fractal series, τ(q) depends linearly on q since h(q) is constant, whereas multifractal series illustrate nonlinear relationship between τ(q) and q while h(q) varies.
Once the scaling exponent is found, the singularity strength α is calculated and the singularity spectrum f(α) is defined as The multifractal spectrum width provides crucial information about the time series.The width of the spectrum defines the range of the exponents and is calculated as In this study, W is taken as the feature to determine the class the subject belongs to. (5)

Data acquisition process
Examining the studies in the literature on gait analysis puts forth that the weight distribution on soles is considerably effective at four major points as illustrated in Fig. 1a [37][38][39][40][41].We also decided on the aforementioned sensor placement by following the past studies and consulting with authorized academicians (Please see: Acknowledgments section) in this field.
Considering the importance of collecting data without affecting one's natural walking habits, 5 pairs of insoles of different sizes (36,38,40,42, were manufactured with appropriate sensor placement.Before starting the experiment, the correct size insoles were placed inside the shoes of the subjects.In the production of insoles, we utilized of durable and soft plastic material widely used in the production of orthopedic products. For the pressure sensors we chose force-sensitive resistors (FSR) mainly used in gait-analysis applications due to their superiority regarding several aspects [42].We decided on FSR402-Short tail from Interlink since this sensor is very suitable for insole placement due to its physical dimensions and furthermore it has acceptable repeatability error [43].The characteristics of the sensor are listed in Table 1 and Fig. 1b illustrates the numbering of the sensors S0 to S7 on the insoles.The data were collected from subjects in the clinical environment in the Audiology Department of Cerrahpaşa Medical School in Istanbul University-Istanbul-Türkiye.The study was performed in accordance with the principles expressed in the Declaration of Helsinki.The experiments were performed after obtaining the approval of Istanbul University-Ethics Committee (Approval number: A-57/07.07.2015) and informed consent was obtained from all subjects and/or their legal guardian before starting the process.For the suffering group members, their VS problems were already determined by other systems such as the Dynamic Posturography under the supervision of audiologists.Data collection was carried out on weekends.The reasons for this were both to reduce the stress of the subjects as much as possible and to prevent the data collection systems from being affected by other devices running in the environment.Data were recorded as the subjects walked along a 12 m long track inside the clinic.Subjects were asked to walk the path twice in order to get away from stress on the first walk and to obtain more meaningful data for the study in line with the recommendations of the audiologists.Thus, we evaluated the data from the second walk.
Data collected via the pressure sensors were first captured by an Arduino Mega unit carried by the subject and then transferred to a laptop nearby wirelessly via HC-06 Bluetooth module.20 samples per second were taken from each sensor, in parallel from all sensors each time.Our experiments/observations revealed that the sampling frequency of 20 Hz is quite sufficient for subjects with VS dysfunction to capture the deviation from the trend, as the MFDFA method is based on; and this has also been confirmed by audiologists.
To give a voltage output, a voltage divider was constructed via a 1kΩ resistor in series with the sensor; the divider was supplied by 5 V DC.Though the used pressure sensor has a high repeatability, its pressure-resistance characteristic is non-linear.Thus, the device was calibrated in the lab using known weights.Finally, the equation of the regression curve giving the relationship between the weight/force w [N] and the output voltage v [V] of the voltage divider was obtained as (10) w = exp((v + 0.2245)∕0.9265) The calibration process was repeated for each sensor used in the experiments.
Information about the subjects participated in the experiments is presented in Table 2.
The distribution of the diseases of suffering subjects determined via computerized dynamic posturography under the supervision of audiologists is given in Table 3.
The identification of all participants has been made anonymous for the publication of this paper.

Data processing
The captured data were first preprocessed before features were extracted from them.At this stage, the data about the first and the last steps were removed from the whole gait data since these data are far from giving correct information; this is because the mentioned steps do not involve in full dynamic behavior.The total number of samples differs from subject to subject, depending on stride length and walking speed.Regarding this point, segment scales were set proportional to step sizes for each subject.Hence, the segment lengths differ from subject to subject, but they are proportional to step sizes in each case.
The corresponding pressure values will be around zero while the foot is in the air.The samples taken during these intervals have to be removed from the whole dataset in order to consider the true time series and thus avoid erroneous measurements.For this reason, precise thresholds were determined to detect that the foot did not touch the ground.
Obviously, each subject weighs differently.On the other hand, we are interested in the deviation of the sample values from the trend within the segment.It is necessary to handle relative deviations rather than absolute ones to give a meaningful idea.Thus, we normalized each sensor dataset before proceeding to profiling.For the normalization process the reference within each subject was taken as the sum of the readings of all sensors of one foot at a moment the foot touches the ground.The idea behind this step is that the abovementioned reference value is proportional to the weight of the subject.
As explained in Section 'Applied Method', the spectrum width was determined as the feature to indicate the distinction between classes.The widths were calculated by following the steps described about MFDFA.We had eight sensors in total; thus we obtained eight features, one spectrum width for each sensor data.In order to find out the most significant ones and reduce the number of features, we applied feature reduction techniques.For the dimensionality reduction, T-test and Sequential Backward Selection (SBS) techniques were used as the feature selection methods and we observed that both performed almost equally.Finally, the feature set was reduced to 6 components with the elimination of features related to sensors placed under the heels.This is understandable, as the data from sensors placed under the heel doesn't fluctuate much around the regression curve even for the VS-diseased people.We used Python's Scikit-Learn module V 0.24.1 (on Dell-Inspiron 13-7359) to find out the classifiers with highest performances.We applied tenfold cross validation technique where approximately 25% of data (from 16 subjects) served for testing and the rest (from 45 subjects) for training.We tested nine classifiers as: DT, Logistic regression, SVM (Linear), SVM (Quadratic), SVM (Gaussian), KNN, KNN (weighted), Bagged tree, Ensemble subspace KNN.We chose these nine out of a large number of classifiers to observe the difference in performance between algorithms as different from each other as possible.
Among the nine classifiers tested, SVM (with Gaussian kernel), KNN and DT were the top 3 performers.The basics of these classifiers can briefly be explained as follows: The KNN algorithm checks the k nearest neighbors to the object/vector to decide about the class it belongs to [44].DT extracts conditional statements for attributes and tries to split the nodes and finds terminal nodes to classify new samples [45]; SVM tries to fit an optimal hyperplane to separate data clusters [46].For KNN, the number of neighbors was selected as k = 5 and Euclidean distance was chosen as the metric distance.For DT, minimum sample number for leaf nodes was selected as 1 and minimum number of samples for a node to split was selected as 2.These three algorithms are also among the most frequently used ones in the literature when working on biomedical signals [47][48][49][50][51][52].

Results
In this section, the entire process is covered on a sample dataset.
Figure 2 illustrates sample raw and preprocessed data for healthy and diseased subjects together with corresponding profile curves (Sensor S2).The multifractal spectrums for the data and the corresponding profiles described in Section 'Materials & Methods' are illustrated in Fig. 3 together with curve fittings of 2nd order polynomials.
The spectrum widths in Fig. 3a and b are approximately W H = 2.75-0.82= 1.93 and W VS = 2.13-0.66= 1.47, respectively.That is, W has a greater value for the healthy subject than for the VS-diseased subject.Detailed information about the W values calculated from the data obtained within the scope of this study is given in Fig. 4, where the graph is plotted using data from sensors for only one foot (The right foot: Sensors S1, S2 and S3).Data from the S0 sensor placed under the heel is not included in Fig. 4 because this sensor does not provide useful information about the whole process as described in the 'Materials and Methods' Section.
As described in 'Materials & Methods' Section, nine classifiers were trained using the spectral widths obtained from the data of 6 of the 8 insole sensors, excluding the heel sensors.The average accuracies of the classification algorithms are listed in Table 4. Confusion matrices and corresponding Receiver Operating Characteristic (ROC) curves for one of the ten training-test set pairs for the top three classifiers are presented in Table 5 and Fig. 5, respectively.

Discussion
In this study, MFDFA was applied to insole pressure sensor data as a time-series based method to analyze human gait to classify healthy and VS diseased people.In this context, the spectrum width was used as the feature to distinguish between the classes.As the main contribution of this work, MFDFA was applied on detrended data, which enables analysis with samples taken from a relatively short walk.In this study, the required walking time was around 10-15 s, significantly less than those in most studies in the literature, where it is 60 s in [2], 5 min in [18] and 2 min in [53].Data were collected from totally eight insole resistive force sensors, four placed in each insole.The number of features were reduced to six after using dimensionality reduction techniques.Multifractal properties were observed for both the healthy and suffering groups; however, the spectrum was wider for the healthy group.The main reason behind this fact we explain as the loss of correlation in the pressure data for VS diseased people.As we know, one main reason of multifractality is the existence of different long range correlations of small and large fluctuations in the time series.Thus, it is inevitable that healthy people present a wider spectrum since their gait is more regular and correlated, while for the diseased people randomness dominates.
The six features each derived from one sensor dataset were used to train several machines for classification between the two groups as 'healthy' or 'suffering'.Among the 9 classifiers tested, the SVM with Gaussian kernel performed best with an average accuracy of 98.2%.At this point, considering the limited number of subjects involved in the process, this accuracy is considered to be quite satisfactory.We strongly believe that an increase in the number of subjects will have considerably positive effect on the accuracy.The study will extend to distinguish between different VS dysfunction-based diseases together with the stage of the disease.We look for features of primary significance to be used in this process.After determining useful features that can be obtained from individuals in their daily lives, a feature reduction process will be followed to determine the most distinctive ones for diagnosis.Currently, we have set a minimum accuracy of 90% as the criterion by which any feature will be included in the reduction process.According to this criterion, the features obtained with the MFDFA application on the data collected from the insole pressure sensors exceed the desired level with an average of 98.2% accuracy.Although the objectivity of this criterion is debatable, our justification for the level in question is based on another study, where we already achieved 89.2% of accuracy [14].In [14], we investigated the effectiveness of 22 features on accuracy based on data collected from IMU sensors, all located in different parts of the body (feet, knees, waist).In b    [14] we also compared the performance of different feature reduction techniques.As we mentioned in 'Data Processing' section, we applied tenfold cross validation technique to determine the performance of our method.In Table 6 we provide additional statistical data about our experiments, where we give the average values for the accuracy, sensitivity, specificity, F1 score and Matthews correlation coefficient (MCC) for the top two performers which present accuracy above 90% (SVM-Gauss.kernel & KNN).We recall that these parameters are defined as follows: Accuracy: Sensitivity (True positive rate):  where TP, TN, FP and FN stand for true positive, true negative, false positive and false negative, respectively.In our case, 'positive' refers to the diseased, while 'negative' refers to the healthy group.Among the state-of-the-art methods, deep learning techniques also find application in this area.In [54], the authors report an accuracy of 96.36% in separating VS patients from the healthy group in their study in which they applied the CNN (convolutional neural network) technique, considering only the patient cluster with nystagmus.They announce the sensitivity and specificity values as 95.85% and 97.18%, respectively.In [55], the authors reveal that machine learning methods considering different features outperform univariate scores in diagnosing VS-based diseases.In line with this view, we are working on a project to diagnose the problems of people with VS dysfunction with the help of machine learning, in which various features are evaluated.
In the whole of our project on VS disorder-sourced disease diagnosis we mainly plan a two-step process where the first step will be a binary classification as 'healthy' or 'suffering' and the second one will sub-classify between diseases in case the first step ends up with the decision 'suffering'.The classification performance using the derived features based on multifractality of time series gives big hope for the features to be used at least in the first step.
As we mentioned above, the study will extend to determining the stage of the disease, if any.We are very hopeful about the effectiveness of the MFDFA method in this future step of the project.The reason for our opinion is that MFDFA is based on one's own walking trend and deviations from this trend are taken into account so that assessments can be easily correlated with the stage of the disease.

Fig. 1 a
Fig. 1 a Sensor placement on the insole b The numbering of the sensors S0 to S7 (Top view)

Fig. 2 a
Fig. 2 a S2 Sensor sample raw data for a healthy subject.b S2 sensor sample raw data for a diseased subject.c Preprocessed data derived from that illustrated in (a).d Preprocessed data derived from that

Fig. 3 a
Fig. 3 a Multifractal spectrum of the data given in Fig. 2c.b Multifractal spectrum of the data given in Fig. 2d

Fig. 4
Fig. 4 Spectrum width (W) values obtained from data from right foot sensors (H: Healthy, VS: VS-diseased)

Table 1
Characteristics of the sensor FSR402-short tail

Table 2
Information about the subjects

Table 3
The distribution of the diseases of suffering subjects

Table 6
Some statistical values related to classification