Data Acquisition
The experiment was carried out during January 2019 to December 2021 (36 months) in a gas delivery platform facility. Eight gas sensors were applied for collecting dataset and two kinds of gases (Ethanol and Ethylene) were tested. The flow level of each gas was 100ml/min, ± 1% of accuracy. All the gas was stored in pressurized gas cylinders. Synthetic air was applied as background gas and two tested gases were added to it for all experiment. Furthermore, the background gas was also tested. For obtaining stable and effective results, the response of the gas sensor array was measured when the operating temperature of sensors was fixed at 400 ◦C based on the best detection parameters provided by manufacturer.
The dataset during a period of month 1, 4, 14, 16, 20, 22, 36 was collected and employed for drift compensation research. The data of month 1 was taken as the benchmark, which considered as no drift. The data of month 4 showed the characteristic of the initiating. While, the data of month 14, 16, 20 and 22 showed that of intermediate stage. The data of month 36 showed the characteristic of end stage. The number of samples collected during a period of one month for two gases are showed in Table 1 below.
Table 1
The number of samples collected during a period of one month for three gases
Month ID
|
Number of samples
|
Ethanol
|
Ethylene
|
Total
|
Month 1
|
84
|
88
|
172
|
Month 4
|
82
|
170
|
252
|
Month 14
|
52
|
43
|
95
|
Month 16
|
28
|
40
|
68
|
Month 20
|
264
|
100
|
364
|
Month 22
|
30
|
30
|
60
|
Month 36
|
600
|
600
|
1200
|
Feature extraction
Feature extraction[16] is an extremely important preprocessing method for exploring the data characteristic and an inevitable step for applying in real application[17]. Figure 1 shows a typical response curve of gas sensor, which is also the data collected for feature extraction.
According to Fig. 1, for gas injection (adsorption) phase, the curve increases first, then become stable. For the cleaning (desorption) phase, the curve decreases first and then become stable. Furthermore, the trends of the original curve and drifted curve are similar. The main difference of them is the stable value and slope of response curve. The first step of this study is to find a certain rule of the difference.
On the other hand, the curve trend of desorption phase is largely depended on the characteristic of adsorption phase. Hence, for reducing the calculation quantity, the data of response curve of adsorption phase are applied for feature extraction. Four features (one steady-state and three transient features) are selected for drift compensation research in this study.
The steady-state feature is defined as the difference of the maximal response value and the baseline.
$${\text{Fs}}={\text{Max(}}R) - Min(R)$$
1
Where \({\text{Fs}}\) represents the steady-state feature, \({\text{Max(}}R)\) is maximal response value, \(Min(R)\) is the baseline of response curve.
Its normalized version is expressed by the ratio of the maximal response value and the baseline value.
$$\left\| {{\text{Fs}}} \right\|=\frac{{Max(R) - Min(R)}}{{Min(R)}}$$
2
Where \(\left\| {{\text{Fs}}} \right\|\) represents the normalized version of steady-state feature and is applied as Feature 1.
The transition feature reflects the dynamics of sensor response and is evaluated through the equation below.
$${\text{F}}{{\text{t}}_{\text{n}}}=\frac{{{{\text{R}}_{{\text{n+}}\alpha }}{\text{-}}{{\text{R}}_{\text{n}}}}}{{{{\text{R}}_{\text{n}}}}}$$
3
Where n is the detection time point, \({\text{F}}{{\text{t}}_{\text{n}}}\) represents the one unit of transition feature and the average of them is the transition feature, \({{\text{R}}_{\text{n}}}\) is response value when detection time point is n, \({{\text{R}}_{{\text{n+}}\alpha }}\) is response value of detection time point \({\text{n+}}\alpha\),\(\alpha\)is the moving scalar and defined as 0.1, 0.01 and 0.001, respectively. Then, three transition features are calculated through three moving scalars, respectively.
Sensor drift compensation method
The process of this method is as follows:
(1) Feature extraction and selection: Extract four features described above, and select two features (one is the steady-state feature, defined as Feature 1 and another is the transient feature, defined as Feature 2) for subsequent analysis.
(2) Relationship exploration of the Feature 1 and Feature 2: Build a model of Feature 1 and Feature 2, and use parameters of this model describing their relationship. Furthermore, the parameters of all sample of Month 1 are calculated and their average is defined as A. The equation is as follow:
$${\text{F}}1(i)=1000 \times A(i) \times F2(i)$$
4
Where i represents the specific one sample, \({\text{F}}1(i)\) is the value of Feature 1 of sample i of Month 1, \(F2(i)\) is the value of Feature 2 of sample i of Month 1, \(A(i)\) is the scale coefficient of sample i. A is the average of \(A(i)\).
(3) Relationship exploration of the Feature 1 of Month 1 and other months: Build a model of the Feature 1 of Month 1 and other months, and use parameters of this model describing their relationship. The parameter for all samples is calculated, respectively, and their average is defined as \(B(j)\), j is 4, 14, 16, 20, 22 and 36 representing Month4, Month14, Month16, Month20, Month22, Month36, respectively. The equation is as follows:
Where \(F1(1)\) represents the average value of Feature 1 in the first month, \(F1(j)\) represents the average value of feature 1 of month j. \(B(j)\) could be considered as the drift degree of month j.
(4) Drift compensation: Feature 1 is the drift compensated feature and applied for next analysis in this study. Parameter \(B(j)\) is the chief principal for compensation and parameter A is applied as an auxiliary parameter for improving compensation accuracy. The calculation process is as follows:
$$F3(i)(j)=F1(i)(j)/B(j)$$
6
Where \(F1(i)(j)\) is the value of Feature 1 of sample i of month j. \(F3(i)(j)\) is the value of the sample i of month j after first step of drift degree compensation, which should be similar to that of Feature 1 of Month 1.
$$P(i)(j)=F1(i)(j)/[1000 \times F2(i)(j)]$$
7
Where \(F2(i)(j)\) is the value of Feature 2 of sample i of month j. \(P(i)(j)\) is the actual scale coefficient of sample i of month j. Hence, \(P(i)(j)\) should be transformed, making it as close as possible to A for drift compensation.
$$M(i)(j)=P(i)(j) \times A/P(j)$$
8
Where \(P(j)\) is average value of \(P(i)(j)\). \(M(i)(j)\) could be considered as the actual scale coefficient of sample i of month j. Real drift and measurement system drift are two predominant sources causing sensor drift. The measurement system factors make the drift unpredictable. The purpose of the calculation of \(M(i)(j)\) is to decrease this part of error partly.
$$N(i)(j)=(M(i)(j)+A)/2$$
9
Where \(N(i)(j)\) is final scale coefficient, which considered both sensor itself and measure system factors inducing drift, making the compensated result more accurate.
$$G(i)(j)=F3(i)(j)/[1000 \times M(i)(j)]$$
10
Where \(G(i)(j)\) could be considered as the compensated result of Feature 2 of sample i of month j.
$$R(i)(j)=1000 \times G(i)(j) \times N(i)(j)$$
11
Where \(R(i)(j)\) is the final compensated value. The produce of \(R(i)(j)\) combines Feature 1 and Feature 2, and considers both sensor itself and measure system factors inducing drift, which lead to more accurate results.
(5) Drift compensation for all sensors: Apply this method for all sensors and all samples. The drift compensated feature is obtained and applied for subsequence analysis.