New strategy to calculate robust impedance using RMHHT estimator


 Magnetotelluric (MT) method is an electromagnetic geophysical method for inferring the earth's subsurface electrical conductivity from measurements of natural geomagnetic and geoelectric field variation at the earth's surface. MT method is widely used in exploration surveys worldwide, but it was hardly applicable in urban areas because of large artificial electromagnetic noise. But in one day, there are several hours at midnight that most of the electric equipment is shutdown. The MT time-series data at midnight is much quiet than in the daytime. Therefore, we focused on calculating the MT impedance using the quiet time-series data. In this research, the data observed by the Phoenix System is used. We introduced a robust impedance estimator based on the Hilbert-Huang transform (RMHHT) and used a new strategy to calculate the broadband MT time-series data. We indicated that this technique needs 4-hour time-series data to get a reliable resistivity structure up to 1,000 seconds in the numerical simulation. This short measurement time makes it possible to carry out MT surveys in urban areas with strong noises. However, the biggest problem of the impedance estimator based on HHT is time-consuming. The total computation time can be reduced significantly by using this strategy. Finally, a successful case study using the midnight time-series data was demonstrated to get reliable resistivity structures in the areas contaminated by heavy noises.


INTRODUCTION 33
Magnetotelluric (MT) method is an electromagnetic geophysical method for inferring the 34 earth's subsurface electrical conductivity from measurements of natural geomagnetic and 35 geoelectric field variation at the earth's surface. MT method is widely used in exploration surveys 36 around the world. Still, along with urban constructions, artificial disturbances to the 37 electromagnetic observations are becoming more serious. MT exploration is hardly applicable in 38 urban areas with large artificial electromagnetic noise. 39 The first step in MT data processing is to estimate the impedance tensor in the frequency 40 domain from the measured time-series data. In standard MT data processing routines, the original 41 time series is divided into the short segments and then transforms the segments into the frequency 42 domain by Fourier Transform. Here we should know one segment is one dataset when doing the 43 impedance estimation in the frequency domain. The field data consists of natural sources and 44 local noise. Szarka(1988) and Junge(1996) summarized active and passive noise sources observed 45 in MT measurements. When the data segment contains heavy noise, it is difficult to differentiate 46 between the MT signals coming from the natural sources and the noise signals produced by the local environment (e.g., electric transmission lines, electric fences, trains), particularly urban 48 areas. If most of the time series data contain strong noise, any impedance estimator will fail to get 49 reliable results. But in one day, there is always several hours' time window at midnight that most 50 of the electric equipment is shutdown. The time-series data in the period is much quiet. In this paper, we consider using the quiet period time-series data only to calculate the impedance tensor.
In this section, three different impedance tensor estimators, BIRRP, SSMT-2000, and RMHHT, 86 are introduced to compare them using the field data in section 4.3. 87

BIRRP 88
Bounded Influence Remote Reference Processing, BIRRP (Chave and Thomson, 2004) is one 89 kind of typical conventional robust estimator to calculate the impedance tensor based on 90 windowed FFT. Transform. Finally, the impedance tensor at a specific frequency can be calculated in the 94 frequency domain as follow: 95 ).
(1) 96 The objective is to minimize the squares residual sum by the least-squares theory, and the 97 transform technique like HHT. 138 The impedance tensor derived from Eq. 1 is defined purely in the frequency domain and cannot 139 be used for the instantaneous spectra obtained by the HHT. It is similar to the method that 140 Berdichevsky (1973) proposed the impedance tensor estimation involved narrow bandpass 141 filtering in the time-frequency domain as follows: 142 Step 3: Selection scheme by hat matrix 186 Before the estimation of impedance, the leverage point should be removed. The hat matrix is 187 N by N matrix and defined as follows: 188 (8) 189 Chave (2003) suggested that the hat matrix's diagonal element, which is more than several times 190 2/N, is problematic. N denotes the number of data. Before the regression, the data where the 191 diagonal element of the hat matrix is more than two times 2/N should be removed by the selection 192 strategy to remove the influence of the leverage point. 193

194
Step 4: Estimation of the impedance tensor 195 The regression step is similar to the method that Neukirch et al. (2014) and Campanya et al. 196 (2014) described. The impedance tensor is calculated as follows : 197  where the n denotes the number of the data, denote the ℎ real part of , denote the 208 ℎ imaginary part of ; it is the same to , , denotes the real part of , denotes 209 the imaginary part of ; it is the same as . The symbol j denotes the imaginary number unit. 210 The total equations for all data can be transformed into matrix form as follows: 211

212
The bivariate complex linear regression problem converts to the real multivariate linear 213 regression problem. The Matlab intrinsic function "robustfit.m" can easily solve the regression 214 problem, and Huber weight is adopted. We also can solve the by Eq. 3, following the M-215 estimator. We have compared the two methods. The result is almost the same. 216 When x equals the remote reference N by two matrices [ , ]. The interstation transfer 217 function is calculated at first as follow: 218 (12) 220 Then impedance tensor is calculated using the relationship between the local transfer function 221 and interstation transfer function as follow: 222 (13) 223 It can be categorized into one kind of two-stage processing strategy. In this way, separating the calculation into the regression between the remote reference with all channels can avoid a 225 direct effect of coherent noise between local channels. If the remote magnetic field does not 226 contain correlated noise with the local site, it will work well. 227 Finally, the robust regression was bootstrapped to compute the data-dependent distribution of 228 impedance values and estimate the intrinsic data errors. Empirically, 1,024 iterations were 229 repeated, considering a sufficient trade-off between accuracy and computation time to evaluate 230 results' uncertainty. 231 232

The Property of Hilbert-Huang Transform 234
In the first test, the HHT was applied to synthetic data combining the following equations to 235 show the property of Hilbert-Huang Transform; 236 The sampling rate is 1,000 Hz, and each segment is 4 seconds. Both of the time segments are 239 added with 10% Gaussian noises. The combination of the two segments is shown in Fig. 1

. The 240
HHT spectrum is shown in Fig. 2. The color denotes the value of 10·log10(amp/max), where 241 "amp" denotes the amplitude of the IPs, "max" denotes the maximum amplitude of the IPs. To recover the digital data information, we should sample at least two points in one period, 249 followed by the sampling theory. But to get the accurate complex coefficient from the time series. 250 We suggest that it is better to sample 4 points in one period, and one continuous segment should 251 contain at least four times the period. For instance, the 32-second continuous time series with the 252 1-second sampling rate. We can get a reliable complex coefficient between 4 to 8 second. If each 253 segment satisfies this condition, we can combine the discontinuous segment directly, and the 254 HHT can express the spectrum variation well. We considered that we could combine the 255 discontinuous MT time-series data directly and then transform the time-series into the frequency 256 domain by HHT if each segment satisfies the condition to get a reliable complex coefficient. 257 Fig. 2 The HHT spectrum of time-series data, which is shown in Fig. 1. The color denotes the 260 value of 10·log10(amp/max), where "amp" denotes the amplitude of the IPs, "max" denotes the 261 maximum of the amplitude of the IPs. 262 263

The Test with Different Length of Synthetic Time-series data 264
In this section, we tried to compare the performance of RMHHT by different lengths of 265 synthetic MT time series data. The way to create synthetic time-series is similar to the method 266 proposed by Chen (2012). The 1-day magnetic time-series data from Memambetsu (MMB) 267 station was used as the synthetic magnetic time series data. MMB station is one of Magnetic 268 Observatory performs geomagnetic and geoelectric observations in Japan. The sampling rate is 1 269 second, and its unit is nT. At first, the time-series magnetic field h(t) is transferred to the 270 frequency domain using the Fourier transform. Then the impedance Z(ω), calculated from the 271 simple 1-D model, is multiplied with H(ω) to determine E(ω). Subsequently, the electric field 272 spectra are transformed back into the time domain to obtain the electric time-series e(t). 273 We tested the effectiveness of data-length using 1-hour, 2-hour, 3-hour, 4-hour, 5-hour, 6-274 hour, 12-hour, and 24-hour synthetic MT time-series data. The 1-hour time series data can only 275 get the maximum reliable period is 150 seconds. And the 2-hour time series data can get the 276 maximum reliable period is about 300 seconds. 4-hour time series data is relatively good enough using 2-hour and 4-hour synthetic MT time series data. The light blue lines are the true model, 279 which were calculated using a simple 1-D model. The 2-hour data set was used to calculate the 280 dark blue circles. The 4-hour data set was used to calculate the red circles. 281 The result of TE mode calculated using the different lengths of time-series at 874-second is 282 shown in Fig.4. The red dotted lines denote the true model, which was calculated using a 1-D 283 model. The horizontal axis denotes the length of the time-series data. From this result, we 284 confirmed that if there were 2-hour noise-free time series data, we could get reliable results at 285 least with a period of 300 seconds. If there are approximately 4-hour noise-free time-series data, 286 we could get reliable data at least with a period of 1,000 seconds by RMHHT.   Szarka (1988) and Junge (1996) summarized the active and passive noise sources observed in MT 337 measurements. Most of the artificial noise is caused by earthing currents originated from the 338 electrical equipment. Usually, the remote reference magnetic field has a high correlation with the 339 local magnetic field, and artificial noise doesn't correlate with each other. Therefore, the 340 correlation between the local magnetic field and remote reference magnetic field is a good 341 indicator of whether the noise is strong or not.  In this research, we analyzed the polarization directions at different frequencies for all sites. 386 As the signal strength at the dead band (1-10s) is very low, it is easy to influence the local noise. 387 The typical polarization directions for the dead band is shown in Fig.9 This data set is contaminated by typical industrial noise from a manufacturing plant from the 393 analysis above. The correlation of time-series between the local site and remote site, at site L6274 394 and L6276, is much higher at midnight. The magnetic field's polarization direction has a preferred 395 direction in the dead-band at all sites in the daytime. That means the data is contaminated by 396 strong noise in the dead band at all sites in the daytime. Garcia et al. (2002) also found that the 397 high-frequency band's MT signal was stronger at midnight. The signal to noise ratio is higher at 398 midnight than in the daytime at all frequency bands in this data set. Therefore, we consider using 399 the data at midnight only to calculate the impedance tensor. 400

The New Strategy to Calculate the Wideband Time-series Data Based on RMHHT 407
For the conventional method, the windowed FFT was widely used to calculate the spectrum 408 for a different frequency. The processing of broadband MT time-series data by HHT is 409 demonstrated in this section. The data set observed using the MTU-A geophysical instruments is 410 used. The time-series data are stored in three files. The high-frequency band (2,400 Hz) and the 411 middle-frequency band (150 Hz) were sampled intermittently. The length of each segment is 8 412 seconds and 1 second separately. The low-frequency band (15 Hz) was sampled continuously. 413 Table 2 shows the strategy for the discontinuous time series. Suppose each segment satisfies 414 the situation to get an accurate complex coefficient. We can combine the intermittent time-series 415 data directly, as the property of HHT mentioned in section 3.1. Considering the accuracy of the 416 complex coefficient data, we set the effective frequency to 4 times of sampling rate. Using the 417 high-frequency (2,400 Hz) data, calculate the period between 4/2,400 to 4/150 seconds. One 418 continuous segment contains about 64 windows of the maxima period. There are 72,000 data points in 1-hour, and 100,000 points (about one and a half hours) are used to calculate. Using the the middle-frequency band (150 Hz), we suggest that 2-hour time-series data is enough to get a 424 reliable result.   Table 3 shows the strategy for continuous time-series. As the computation of HHT is time-429 consuming. We design a strategy that can decrease the computation time much. Using the 15-Hz 430 data, calculate the period data between 4/15 to 5 seconds. There are 54,000 data points in 1-hour, 431 and 100,000 points (about 2 hours) are used to calculate. And we downsampled the continuous 432 15-Hz data to 1-Hz data to compute the period larger than 5 seconds. As shown in the simulation, 433 4-hour noise-free time series data can get a relatively reliable result until 1,000 seconds. Combine 434 all the data, we can calculate the full scall period from high frequency to 1,000 seconds by four 435

hours. 436
This strategy allows us to select data in the time domain. There are always several hours at 437 midnight that most of the electric equipment is shutdown. The time-series data in these periods 438 are much quiet. This strategy is suitable to use only the quiet period time-series data to calculate 439 the impedance tensor. 440

The Application to the Field Data 444
In this section, we will compare the performance of the three different estimators by the field   Fig. 11 The magnetotelluric sounding curves calculated using the data at site Y0626. SSMT-2000 493 was used to calculate the blue curves from one-day data, BIRRP was used to calculate the black 494 curves using daytime data from 12:00:00 to 16:00:00 in UTC on June 26. BIRRP was used to Fig. 12 The magnetotelluric sounding curves calculated using the data at site Y0626. SSMT-2000 499 was used to calculate the blue curves using one-day data, RMHHT was used to calculate the black 500 curves using the midnight data from 16:00:00 to 20:00:00 in UTC on June 26 and using L7158 as 501 remote site. BIRRP was used to calculate the red curves using the midnight data from 16:00:00 to 502 20:00:00 in UTC on June 26. 503 504 Similar results were obtained at the L6178 and L7158. Fig.13 and Fig.14 show the result 505 calculated using the data at site L6178 and L7158, respectively. The RMHHT result performs 506 well at the full-scale. In the dead band, as we used the relatively quiet data at the Y0625 as a 507 remote site, it performs as well as BIRRP. But in the long period of around 1,000-second, the 508 error bar calculated by RMHHT becomes lagger. Suppose there is another quiet midnight data, 509 we can combine them directly. In this way, we can get more data at the impedance estimation 510 step in the frequency domain. The result will become more stable. 511 Fig. 13 The magnetotelluric sounding curves calculated using the data at site L6178. SSMT-2000 514 was used to calculate the blue curves using one-day data, RMHHT was used to calculate the black 515 curves using the midnight data from 16:00:00 to 20:00:00 in UTC on June 26 and the Y0625 516 remote site. BIRRP calculates the red curves using the midnight data from 16:00:00 to 20:00:00 517 in UTC on June 26. 518 519 520 Fig. 14 The magnetotelluric sounding curves calculated using the data at site L7158. SSMT-2000 521 was used to calculate the blue curves using one-day data, RMHHT was used to calculate the black 522 curves using the midnight data from 16:00:00 to 20:00:00 in UTC on June 26 and the Y0625 frequency and middle-frequency band data. In the simulation using synthetic data sets, we have 530 shown that we could get a reliable result until 1,000 seconds from continuous 4-hour noise-free 531 time series data. At midnight, most of the electrical equipment is shutdown. Usually, the data at 532 midnight is much quiet than in the daytime. This paper demonstrated that a time window of 4-533 hour length was almost noise-free during the local night at site L7158, L6178, and Y0625. We 534 used data from this window alone to estimate impedances based on RMHHT. We compared the 535 performance with the conventional method Bounded Influence Remote Reference Processing, we can combine them. In this way, we can increase the number of data sets at the impedance 543 estimation step in the frequency domain. The result will become more stable. 544 545 Declarations 546

Availability of data and materials 547
The magnetic time-series data used to construct the synthetic MT data is downloaded from the