SeisDeNet: an intelligent seismic data Denoising network for the internet of things

Deep learning (DL) has attracted tremendous interest in various fields in last few years. Convolutional neural networks (CNNs) based DL architectures have been successfully applied in computer vision, medical image processing, remote sensing, and many other fields. A recent work has proved that CNNs based models can also be used to handle geophysical problems. Due to noises in seismic signals acquired by geophone equipment this kind of important multimedia resources cannot be effectively utilized in practice. To this end, from the perspective of seismic exploration informatization, this paper takes informatization data in seismic signal acquisition and energy exploration field using cutting-edge technologies such as Internet of things and cloud computing as the research object, presenting a novel CNNs based seismic data denoising (SeisDeNet) architecture is suggested. Firstly, a multi-scale residual dense (MSRD) block is built to leverage the characteristics of seismic data. Then, a deep MSRD network (MSRDN) is proposed to restore the noisy seismic data in a coarse-to-fine manner by using cascading MSRDs. Additionally, the denoising problem is formulated into predicting transform-domain coefficients, by which noises can be further removed by MSRDNs while richer structure details are preserved comparing with the results in spatial domain. By using synthetic seismic records, public SEG and EAGE salt and overthrust seismic model and real field seismic data, the proposed method is qualitatively and quantitatively compared with other leading edge schemes to evaluate it performance, and some results shows that the proposed scheme can produce data with higher quality evaluation while maintaining far more useful data comparing with other schemes. The feasibility of this approach is confirmed by the denoising results, and this approach is shown to be promising in suppressing the seismic noise automatically.


Introduction
With the construction of the national digital detection network, the observation methods and technologies of the new Internet of things are gradually applied [1][2][3][4], which basically realizes the digitization, networking and integration of seismic monitoring and exploration, and the ability of seismic monitoring and exploration has been greatly improved. The application of digital technology provides rich first-hand data for seismic scientific research and lays a good foundation for explanations and predictions of seismic exploration. Noise attenuation is crucial to obtain high-quality data from the seismic signals which are collected with geophone sensing equipment and networks [4]. But this usually badly interfered with and distorted by noises in seismic exploration. It has an important impact on oil and gas exploration.
To remarkably enhance signal to noise ratios (SNR) and adapt to high precision seismic exploration, a novel transform domain based CNN architecture for denoising seismic data generated by the Internet of Things is presented in this paper, and primary contributions include: (i) To fully exploit features from seismic data, MSRD block is proposed and built for restoring seismic data with noises. (ii) Denoising problem is formulated into predicting transform-domain coefficients, by which noises can be further removed with MSRDN while more detail data is preserved comparing with the results in spatial domain. (iii) Our method is qualitatively and quantitatively evaluated with synthetic seismic records, public SEG and EAGE salt and overthrust seismic model and real field seismic data. The proposed method can get signals with higher PSNRs and far more useful data comparing with other leading edge ones. Essentially, it shows more considerable denoising performance under higher noise level.
The reminder of the manuscript is structured as the following: Section 2 presents a brief review of relate studies. Section 3 proposes a novel scheme, and it is validated in Section 4. Lastly, Section 5 summarizes the work.

Related work
For seismic exploration data from the Internet of things, many seismic denoising methods have been developed so far. The seismic denoising methods were proposed by Canales to suppress random noises and achieved a potential result [5]. Since then, several effective schemes for suppressing random noises have been proposed, such as sparse transform based methods, dictionary learning based procedures, and nonlocal means algorithm [8]. Among them, sparse representation of seismic signals has become popular.. Conventionally, almost all denoising is performed in a transform domain [6,7,[9][10][11]. For learning based methods [12,13], an over complete dictionary, which is generally written as an explicit matrix, can be inferred from a series of examples, and the matrix should be trained before it is adapted to examples.
Recently, with the continuous expansion of Internet of things data and cloud platform data, there are differences in data processing. Deep learning (DL) becomes very attractive and demonstrates excellent ability in many areas, such as multimedia and machine learning, since it overcomes the shortcomings of common learning based schemes by leveraging convolutional neural network (CNN) architecture. Since the CNN-based SRCNN was first proposed in [14] for low-level vision, lots of methods [14][15][16][17][18][19][20][21][22] with different network depth, i.e., the deeper layer RCAN [22], have been developed. The depth of networks has a positive impact on the SR performance of these networks, which was proved by the increasing number of convolutional layers from 3 to 400. In some recent CNNbased SR, several identical feature extraction modules are connected to build the entire network [19][20][21][22], and each module is essential in this structure. Obviously, these CNN-based methods are superior over the traditional methods. Some diversified SR network structures are also proposed. Recently, attention-based SR network models have been proposed, which uses addition operation in the output layer, avoiding the large amount of computing power consumed by convolution kernel multiplication, so as to efficiently complete the image SR.
These SRs are performed in spatial domain. But SR in transform domain can better preserve the context and texture information of an image in various layers. So, a deep wavelet SR network was proposed by Guo et al. [26] for acquiring HR images, whereas "missing details" of wavelet coefficients of LR images were predicted. Then, an orthogonally regularized deep network was suggested by the same team [27], whereas discrete cosine transformation (DCT) was integrated into CNNs. Besides, a face SR was constructed on the basis of wavelet transform and CNNs to capture local textural details and global topology information of faces [28].
Therefore, a novel transform domain based MSRDN architecture for seismic signal denoising is proposed and the method is detailed in Section 3.

Proposed method
In this paper, we propose and built a MSRD block to fully exploit features from seismic data for restoring seismic data with noises. Meanwhile, denoising problem is formulated into predicting transform-domain coefficients, by which noises can be further removed with MSRDN. First, the proposed method for seismic data denoising is outlined. Then, the architecture of the proposed MSRDN and MSRD block are briefly described. Lastly, transform domain is introduced. Overview Figure 1(a) presents a flowchart of our method. The noisy seismic data is first upsampled to the target high-resolution size to produce resized noisy seismic data, from which one low-frequency (LF) sub-band and several high-frequency (HF) sub-bands can be obtained by using specific transform. To predict the transform coefficients of the target clear seismic data, two deep residual networks are applied on the top of one LF sub-band and four HF sub-bands for preserving the global topology information and collecting the structure and texture information, respectively. Lastly, the target clear seismic data can be obtained via an inverse transform.

MSRD network structure for seismic Denoising
We aims to restore a clear seismic data I Clear from a noisy seismic data I Noisy . As shown in Fig. 1(b), our MSRDN has two parts: the shallow (SSFE) and the deep seismic signal feature extraction (DSFE) module. We solve the problem for noised and clean signals: Mean square error (MSE) function is the most popular objective optimization function in image SR [15,17,18], whereas training with MSE loss maybe not the best option according to Lim et al. [29]. Mean absolute error (MAE) function L DE is used to reduce computations and avoid unnecessary training tricks in this work, which is defined by Particularly, the shallow feature F 0 is obtained from the noisy seismic data with double convolution layers as follows where H SSFE1 and H SSFE2 are the convolution operations of these two layers. Then, the extracted F 0 will be utilized in DSFE module. A DSFE module is composed of cascading MSRD blocks and each one can collect as much data as possible and extract more useful data. Then a 1 × 1 convolutional layer is employed to appropriately manipulate the output signals. This procedure can be expressed as where F i (i = 1, 2, …, D) is feature maps generated by MSRD blocks. [F 1 , F 2 , …, F D ] is the concatenation of (2)  Figure 2 shows the proposed MSRD block. Each MSRD block has m-path, which can be used for exploiting different scale characteristics features. Different from the RDN model [21], multi-bypass network is constructed in each module, with different convolutional kernels for different bypasses. So, the proposed model can adaptively measure m-path characteristics at various scales. This can be view as a wide and deep neural network model.

Multi-scale residual dense (MSRD) block
Supposing the input and output of the d-th MSRD block are F d − 1 and F d .
where Y i 3×3 and Y i 5×5 refer to the function of 3 × 3 and 5 × 5 Conv layer in the i-th convolutional layer respectively, i = 1, …, c, …, C. By combining the previous information with the current multi-scale information, we have retained short path information.
(5) Fig. 2 The architecture of MSRD block where H d LFF is the composite function of the 1 × 1 convolutional layer in the d-th MSRD block. σ is the ReLU function.
[⋅] denotes the concatenation of feature maps by various convolutional kernels. Finally, the input information and combined multi-scale information are aggregated as follows: where F d is the output of the d-th MSRD block.

Transform-domain analysis for seismic data
Since wavelet can sparsely represent one-dimensional signals without point discontinuities, it has been successfully used in representing digital signals. But, image functions with curves and straight lines in higher dimensions cannot be "optimally" represented with wavelet analysis [30]. Subsequently, some sparser transform methods [31][32][33] have been presented such as curvelet transform, contourlet transform (CT), non-subsampled CT (NSCT), shearlet transform (ST), non-subsampled ST (NSST) and compactly supported ST (CSST), etc., in which the anisotropic regularity of a surface following edges can be exploited. Wherein, CSST is optimally sparse.
Generally, a sparse representation of signals can benefit signal processing tasks. To improve the representation sparsity of signals CT was developed by Do and Vetterli [31], which has two primary features of directionality and anisotropy and be superior to curvelets, bandelets and other geometrically-driven representations in its partially easy and efficient wavelet-like implementation using iterative filter banks.
Next, their sparsity is analyzed. The denoising effect is determined by the degree of representing decomposed effective signals [32]. The denoising performance becomes better and better as the sparsity of the method increases. Figure 3 shows the reconstruction errors in wavelet transform, curvelet transform, NSST and CSST domains and the data used are presented in Fig. 4(a). Clearly, NSST and CSST have the smallest approximation errors while retaining the same percentage coefficients, and the errors are close to zero when only 6% coefficients are retained, indicating its optimal sparsity. In addition, the literature [33] also indicates that compactly supported shearlets are optimally sparse.
The HF coefficients of CSST, NSST and WT are compared in Fig. 4. Obviously, CSST displays more accurate results of the curvature. We mention that these transforms can be applied in various denoising networks, with improved performance. Further experiment is conducted to evaluate the role of transform, as shown in Section 5.

Experimental results
Experiments are conducted to qualitatively and quantitatively evaluate the performance of MSRDN, with the following contrasting seismic denoising methods: traditional methods-wavelet-based methods and curveletbased methods and DL based methods-VDSR [15], multi-scale residual network (MSRN) [20] and residual dense network (RDN) [21].

Seismic datasets
The basic data is synthesized with lots of seismic records, such as linear, curvilinear; various dip angle and fault events, with 1000 Hz sampling frequency and 150 traces. Ricker wavelet with the following expression is used as seismic wavelet:   where t is time and f is sampling frequency. Figure 5(a) shows partial synthetic seismic data. Besides, immigrated stack profile measured with the SEG/EAGE salt and overthrust model [34] is presented in Fig. 5(b). In addition, these seismic records are rotated by 45°, 90°, 135°, 180°, 270°, and 360°, respectively, following [17,18]. Then, random noises of various levels are added into original and rotated datasets to obtain additional expanded versions, of which 80% are selected for training and 20% for testing.

Implementation details
Our MSRDN contains 10 MSRD blocks. For training four HF sub-bands and one LF sub-band are produced by passing training seismic data with 1-level NSST, and then cropped into 48 × 48 patches with an overlap of 24 pixels. The initial learning rate is 10 − 4 for all layers and it haves every 50 epochs. The batch size is set as 64. The implementation of our method is realized under the Torch7 framework on an NVIDIA Tesla P100. The ADAM optimizer is used for updating. The approximate time of training our model is 6 hours for 200 epochs.

Comparison with traditional and leading edge methods
The denoising performance of our method is checked on synthetic seismic data in this section. Peak signal-tonoise ratio (PSNR) [35] is employed to quantitatively justify the reconstruction results: where X ′ and X are the M × N denoised and clear seismic data, respectively; MAX I is the largest possible pixel intensity value. (10) The comparison is conducted with an identical training set for all models, and the released codes are used for the contrast models. Tables 1 and 2 present PSNR (dB) values for comparison with bold optimal values. PSNRs of our method are significantly higher than that of other schemes when evaluated on seismic data. Besides, Figs. 6 and 7 indicate that our method presents better qualitative results, with less residual coherent and incoherent noise.
In addition, noisy seismic data of Liaohe depression, China, which is acquired in the identical data area with same excitation and reception for validating the processing result of our method, are selected the field data examples. To guarantee no valid information loss, these data are roughly processed to generate the targeted clear data by using traditional random denoising modular of large processing system. The random noises of various levels are added into the targeted data for deep learning to learn and recognize noises and effective signals. For the same reason, the real seismic data are rotated by 45°, 90°, 135°, 180°, 270°, and 360°, to obtain expanded versions. 80% of versions are used for training; the remainders are for testing. As shown in Fig. 8, Fig. 8(a) is an original noisy data, and Fig. 8(b) presents the denoised data with the proposed scheme showing some highlighted effective signals, especially in the red rectangle area, a clearer interlayer structure and enhanced continuity of the events. Overall, our proposed method can also achieve a satisfactory result for real filed seismic data.

Discussion
In this section, we mainly discuss the effectiveness of transform domain. We conduct ablation investigation. Specifically, predication of CSST coefficients is introduced into denoising of seismic data and the contribution effect is evaluated. Four methods (VDSR, MSRN, RDN and our MSRDN) are selected and integrated with CSST predictions respectively. Figure 9(a) presents the PSNRs of MSRDN with and without CSST predictions across seismic data with noises at various levels. Figure 9(b) indicates the averaged PSNRs of VDSR, MSRN and RDN with noise level 0.1 from the left to the right respectively. Significant and consistent improvement are observed across all networks and benchmarks when integrated with CSST, demonstrating that CSST predictions over-perform in effectiveness as compared to spatial domain.

Conclusions
We present a CNN-based seismic data denoising method in this work. To improve the seismic denoising performance an MSRDN consisting of a set of cascading MSRD blocks is proposed to exploit features of seismic data. Additionally, by applying transform-domain operator to the network structure richer detail information can be preserved in seismic signals with noises and the seismic denoising performance is improved further. The qualitative and quantitative experimental results illuminate that our method is significantly superior over other state-ofthe-art ones in seismic data restoration.