Superimposed Pilot-based Channel Estimation for RIS-Assisted IoT Systems Using Lightweight Networks

Conventional channel estimation (CE) for Internet of Things (IoT) systems encounters challenges such as low spectral efficiency, high energy consumption, and blocked propagation paths. Although superimposed pilot-based CE schemes and the reconfigurable intelligent surface (RIS) could partially tackle these challenges, limited researches have been done for a systematic solution. In this paper, a superimposed pilot-based CE with the reconfigurable intelligent surface (RIS)-assisted mode is proposed and further enhanced the performance by networks. Specifically, at the user equipment (UE), the pilot for CE is superimposed on the uplink user data to improve the spectral efficiency and energy consumption for IoT systems, and two lightweight networks at the base station (BS) alleviate the computational complexity and processing delay for the CE and symbol detection (SD). These dedicated networks are developed in a cooperation manner. That is, the conventional methods are employed to perform initial feature extraction, and the developed neural networks (NNs) are oriented to learn along with the extracted features. With the assistance of the extracted initial feature, the number of training data for network training is reduced. Simulation results show that, the computational complexity and processing delay are decreased without sacrificing the accuracy of CE and SD, and the normalized mean square error (NMSE) and bit error rate (BER) performance at the BS are improved against the parameter variance.


I. INTRODUCTION
A S the cornerstone of the future Internet of Things (IoT)   connectivity, the evolution of fifth-generation (5G) and sixth-generation (6G) networks has attracted consistent attention in the application of IoT.For example, intelligent buildings connected with the internet to manage different devices [1], smart health care and intelligent driving proposed by [2], and home automation put forward by [3], etc.In these IoT systems, channel estimation (CE) plays critical roles, such as to overcome channel time variation [4] or the increase of occlusion probability, and adjust to an affordable transmission power using appropriate modulation and coding methods [5].
The CE for IoT systems is vital for effective receiver operation [4], [6] and [7].In IoT downlink systems, a pilotbased hybrid CE method is introduced in [4], which combines the 1-D time-domain Wiener filter technique with a computationally simple maximum likelihood estimator in the C. Qing, L. Wang, L. Dong, G. Ling and J. Wang are with the School of Electrical Engineering and Electronic Information, Xihua University, Chengdu, 610039, China (E-mail: qingchj@mail.xhu.edu.cn).frequency domain, and an improved computationally efficient linear minimum mean square error (MMSE) estimator for the downlink IoT systems is proposed in [7].As for IoT uplink systems, the least-squares (LS) and MMSE based CE is adopted in [6].[8] proposes a low complexity CE algorithm based on the conventional LS method for downlink narrowband IoT systems.[9] discusses the downlink CE of broadband IoT systems.Given these CE schemes in [4] and [6]- [9], there is still room for improving the spectral efficiency and energy consumption.On the one hand, existing pilot-based CEs for an IoT system must allocate additional spectrum resources to transmit pilots, and this causes low spectral efficiency.On the other hand, energy-consuming needs to be handled in IoT systems.One extreme case in [10] is that the user equipment (UE) aims to extend up to ten years' battery lifetime.In this situation, transmitted pilots and data of the IoT system separately will definitely increase the energy consumption and hardly achieve the system target.Without extra time-frequency resources for the pilot, the strategy in [11] transmits the pilot and data in a superimposed manner, which alleviates the issue of low spectral efficiency and high energy consumption, and this inspires us to propose a CE solution of IoT systems based on superimposed pilots.
Besides, it is a common situation that IoT communication is blocked due to complex scenarios of propagation paths in industrial IoT (IIoT) [4].Thus, increasing the robustness of the communication link is an urgent task to guarantee CE performance.To resolve blocked propagation paths, reconfigurable intelligent surface (RIS) provides an attractive option [12].The RIS, an artificial panel of electromagnetic material, is made from a large array of low-cost passive scattering elements, which can manipulate the wireless environment by adjusting the amplitude or phase shift of reflected signals [12].Different from the traditional amplify-and-forward relay, the passive elements in RIS consume little energy [13].From [14], the influence of material used in the RIS is another interesting topic, which gives us a novel perspective for the channel estimation in future works.By considering a passive RIS, we focus on the superimposed pilot-based channel estimation for RIS-assisted communication systems in this paper.In recent years, embedding RIS in the IoT systems has been envisioned as a revolutionary means to transform a passive wireless communication environment into an active reconfigurable one, which can provide environmental intelligence for different communication objectives [15].In addition, RIS also enhances system throughput by at least 40 percent [16] and system cov-erage by 1/3 [17].Then, deploying RIS for the superimposed pilot-based CE of IoT systems is a highly desired approach to tackle the issue of blocked propagation paths, which, however, has not been well investigated in existing works.
In recent years, deep learning (DL) has made a major breakthrough in advanced information processing and computer vision [18].In [19], the essence of DL is to learn the mapping relationship between input and output through training data samples, get a model structure, and then feed the test data to obtain the predicted output via the model.Even so, it is still hard to directly explain the internal mechanism and theoretical analysis of DL [18], [19].Potential applications of DL in the physical layer have been increasingly recognized due to the new features of future communications, such as complex scenarios of unknown channel models and precise processing requirements [18].In addition, DL-based CE in RIS-assisted communication systems has also aroused extensive research interest.[20] proposed two convolutional neural networks (CNN)-based methods to execute the de-noising and approximate the optimal MMSE CE solution.[21] proposed an enhanced extreme learning machine (ELM)-based CE to facilitate accurate CE. [22] proposed an untrained deep neural network (DNN) based on the deep image prior network to de-noise the effective channel of the system acquired by the conventional pilot-based LS estimation and obtain a more accurate estimation.However, DL-based superimposed CE in RIS-assisted communication systems has not been investigated, which is particularly important for an IoT system to reduce the energy consumption with high spectral efficiency.
To reduce the energy consumption while maintaining the spectral efficiency of IoT systems, tackling blocked propagation paths, and enhancing the CE's accuracy, we investigate the superimposed pilot-based CE for RIS-assisted IoT systems in this paper.The main contributions of our work are summarized as follows: 1) We propose a superimposed pilot-based and RIS-assisted mode into the IoT system to alleviate the issues of spectral efficiency and energy-consuming.Besides, the non-superimposed pilot-based CE in [7] encounters the issue that its pilot/data cannot be received completely, and the superimposed pilot-based method can effectively alleviate this issue (especially when the channel changes frequently).On the one hand, by employing the superimposed pilot-based mode, we reduce the energy consumption and improve the spectral efficiency of IoT systems.On the other hand, the robustness of the communication link is enhanced by employing RIS.Especially, the combinations of superimposed pilot-based mode and RIS further reduce energy consumption, and thus prolong the battery life of UE.As far as we know, with prolonged battery life of UE and enhanced spectral efficiency of the IoT system, the issue of improving the accuracy of the CE at the BS has not been well addressed in [10], [23].Therefore, it is beneficial to study the superimposed pilotbased CE for RIS-assisted systems.2) We develop two dedicated lightweight networks to reduce the computational complexity and processing delay for the CE and symbol detection (SD) at the BS.From the perspective of integrating the non-NN and NN-based solutions, the initial features are highlighted by employing conventional estimation and detection methods to perform feature extraction, and the lightweight networks are oriented to learn along with the highlighted initial features.Thus, the non-NN and NN-based solutions cooperatively improve the CE and SD, while holding the lightweight for the developed networks.Due to the assistance of non-NN mode, both the CE and SD networks are shallow networks and thus have lightweight structure.The computational complexity of proposed method is lower than the conventional method, e.g., MMSE channel estimation and MMSE equalization, which saves the computational resources and shortens the processing delay of BS. 3) With the reduced computational complexity and processing delay by using de-noising network, feature extraction, and feature fusion, we further improve the NMSE and BER performances at the BS.For CE, we exploit the learning ability of developed CE network according to de-noising (suppress the superimposed interference and noise) and feature extraction, which alleviates the influence of superimposed interference.The improved CE refines NMSE performance, and thus improves its subsequent SD.Besides, the developed fusion network captures the additional features for SD and improves the BER performance effectively at the BS.
The remainder of this paper is structured as follows: In Section II, we present the system model of superimposed pilotbased CE.The proposed method is presented in Section III.The computational complexity is analyze in Section IV and followed by numerical results in Section V. Finally, Section VI concludes our work.
Notations: Bold face lower case and upper case letters represent vector and matrix, respectively.(•) T is the transpose.
stands for the Hadamard product.Re(•) and Im(•) represent the real and imaginary parts of complex numbers, respectively.

II. SYSTEM MODEL
As shown in Fig. 1, we consider a frequency-selective Rician fading RIS-assisted IoT system with OFDM modulation.In Fig. 1, supposing the propagation path is blocked by buildings, the RIS is installed on the surface of the building to alleviate this issue.h D denotes the composite channel frequency response (CFR) of transmitter-receiver link.h B,g and h Q,g represent the aggregated CFRs of the RIS-receiver link and transmitter-RIS link related to the g-th sub-surface, respectively.The RIS is composed of many passive reflecting elements, and to reduce the complexity and training overhead of CE, adjacent elements are grouped into a sub-surface to share a common reflection coefficient [24].Besides, the RIS control link is used to adjust the phase shift.This system considers N sub-carriers and assumes the maximum delay spread L is shorter than the cyclic prefix (CP) length L CP , i.e., L < L CP [6], [24], to resist inter symbol interference (ISI) and inter carrier interference (ICI).The frequency-domain signal received at the receiver is expressed as where λ ∈ [0, 1] is the power proportional coefficient, P stands for the total transmitting power.
T represents the CFR between the receiver and transmitter.
x p ∈ C N ×1 is the pilot and x d ∈ C N ×1 denotes the modulated signal.w ∈ C N ×1 is the circularly symmetric complex Gaussian (CSCG) distribution with mean zero and variance σ 2 w .The composite CFR between the receiver and transmitter is given as [24] h where h D ∈ C N ×1 denotes the CFR of the transmitterreceiver link, and T stands for the phase-shift vector, which is given by where θ g ∈ [0, 2π] denotes the phase shift of the g-th subsurface, G is the number of sub-surface.To simplify the design of hardware and maximize the reflection power of the RIS, we fix α g = 1, ∀g = 1, • • • , G and only adjust the phase shift θ g [21].
By denoting where h Q,g ∈ C N ×1 and h B,g ∈ C N ×1 represent the aggregated CFRs of the transmitter-RIS link and RIS-receiver link related to the g-th sub-surface, respectively.According to (1), ( 2) and ( 4), the received signal in the frequency domain is rewritten as With the received signal y, the LS estimation and ZF equalization are used to highlight the initial features of estimation and alleviate the network learning, respectively.In this paper, to save bandwidth resources and energyconsuming [25], we adopt the method of superimposed pilot for CE and SD in Fig. 2. Two dedicated lightweight networks, namely CE-Net and FUS-Net, are developed to implement CE and SD, respectively.Distinguished from the conventional methods, e.g., the MMSE CE and MMSE SD, non-NN and NN-based approaches are integrated into our work, in which the CE-Net and FUS-Net are embedded into these conventional methods to cooperatively improve the performance of CE and SD.

III. SUPERIMPOSED PILOT-BASED CHANNEL ESTIMATION
As shown in Fig. 2, first, we superimpose the pilot x p and modulated signal x d together at the UE.Second, we perform inverse fast Fourier transform (IFFT) and add CP operations.Third, the signal propagates over the wireless channel.At the BS, the received signal y is achieved after removing CP and Update Θ FUS by using the Adam algorithm (learning rate γ 2 ) to minimize Loss FUS−Net .10: end for Testing phase: 11: Load the trained parameters Θ CE and Θ FUS .12: Perform LS estimation to obtain h LS using Eq. ( 6).13: Reshape the complex-valued h LS to real-valued h LS using Eq. ( 7).14: Predict h CE based on Θ CE and h LS using Eq. ( 8).15: Perform ZF equalization to obtain s ZF using Eq. ( 10).16: Cancel the superimposed interference from pilot to obtain the coarse data s d .17: Splice s d and y to real-valued using Eq. ( 13).18: Predict s FUS based on Θ FUS and s in using Eq.(14).performing fast Fourier transform (FFT) operations.Next, the conventional LS estimation is employed to highlight the initial features of CE for the lightweight NN.Since a lightweight NN possesses very limited learning ability, the highlighted initial features orient the learning of CE-Net and thus improve the effectiveness of CE.Similarly, the developed FUS-Net is also a lightweight network and thus needs to extract the initial equalization features.In this paper, the conventional ZF equalization is employed as a feature extractor to capture the initial equalization feature s ZF .With the initial equalization feature s ZF , the coarse data s d is obtained by cancelling the superimposed pilot.Then, the coarse data s d and the received signal y are fed into the FUS-Net to produce the detected symbol s FUS .In Section III-A, the initial feature extraction for CE-Net is presented.Then, we develop a lightweight NN, named as CE-Net, to improve the performance of CE in Section III-B.In Section III-C, the initial feature extraction for FUS-Net is elaborated.Next, fusion learning-based lightweight NN, named as FUS-Net, is used to refine the performance of SD in Section III-D.Last, in Section III-E, the details of online deployment are described.

A. Estimation Feature Extraction
With the received signal y, the initial features of CE are extracted by LS estimation and used as the input of the CE-Net.Using the LS estimation, the initial CFR h LS ∈ C N ×1 is given by where y(n) and x p (n), n = 1, 2, • • • , N , are the received signal and transmitted pilots, respectively.The extracted feature i.e., h LS , is employed for subsequent enhancement of CE.

B. CE-Net based Channel Estimation
To obtain the refined CE feature which is different from the conventional estimation perspective, we construct the lightweight and effective CE-Net, which learns the mapping relationship between input and output data.Then, a certain estimation feature, called refined estimation feature h CE , is captured through CE-Net to complement the initial estimation feature h LS .
1) Network Design: According to [26], the parameter settings of CE-Net, e.g., layer depth, layer width, and activation function, are still a challenge in the NN.Based on a large number of experimental simulations and performance tradeoffs, we determine that the CE-Net consists of L layers, including an input layer, two hidden layers, and an output layer.Table I summarizes the CE-Net's architecture, which is described in detail below.
In the CE-Net, we set the neurons of the input layer and output layer as 2N , hidden layer 1 as 6N , and hidden layer 2 as 4N to reduce the complexity of CE-Net compared to a deep network.To avoid the overfitting problem and accelerate convergence for the CE-Net [27], we employ the batch normalization (BN) to normalize the input layer.The hidden layers use rectified linear unit (ReLU) activation function, defined as f a (x) = max (0, x), to alleviate the gradient vanishing problem [28].And the output layer employs the linear activation function.Besides, the CE-Net refines estimation performance with these parameters.
To facilitate the real-valued CE-Net, we reshape the complex-valued h LS ∈ C N ×1 using equation (7) to real-valued h LS ∈ R 2N ×1 , which is formulated as Next, the entries of h LS form the inputs of CE-Net.Via the CE-Net, the refined estimation feature, denoted as h CE ∈ R 2N ×1 , is given by where f CE (•) and Λ CE are the CE-Net operation and its network parameters, respectively.According to (8), we refine the estimation performance without using the second-order statistics about channel.
2) Training and Deployment: A large number of data samples are collected to train the CE-Net.Specifically, the generation of these data samples is shown below.For the CE-Net, the training set is presented as h LS , h Label .In this paper, the frequency-selective fading channel, i.e., h Label , is derived from the widely used channel model COST2100 [29].Zadoff-Chu sequence is employed as the pilot x p , and modulated signal x d is created by a quadrature-phase-shift-keying (QPSK) symbol set [30].According to (1)-( 4), the set of received signal is formed as {y}.From (6), we obtain the set h LS .Finally, the complexvalued sets of {h Label } and h LS are reshaped to the real-valued sets h Label and h LS , respectively.We use training sets h LS , h Label to train the CE-Net.The details are elaborated in Algorithm 1.In addition, to verify the trained network parameters during the training phase, the same generation method of training set is also used to generate a validation set [30].
Besides, we employ the criterion of minimizing the mean squared error (MSE) to train the CE-Net, and the loss function is expressed as where S 1 represents the number of training samples, and β CE denotes the regularization coefficient which is used to avoid overfitting, and is the layer index.
Training set h LS , h Label has 100,000 samples [20], [30]- [32], and the batch size is set as 80 samples.Validation set of the CE-Net has 20,000 samples.The epoch number of CE-Net is set as 40 times.Adam optimizer [33] is used as the training optimization algorithm associated with parameters β 1 = 0.99 and β 2 = 0.999 [34].The learning rate is set as 0.001, and the L 2 regularization [35] is performed for the CE-Net.
During training, the training operation is performed once for the CE-Net.Then, the trained network is leveraged to deploy online running.

C. Equalization Feature Extraction
To avoid using the second-order statistics of the noise, we employ ZF equalization to obtain the initial equalization value, which is also part of the input of FUS-Net.
From (1), the pilot x p is superimposed on the modulated signal x d .With the received signal y, the ZF equalization is first used to highlight the initial feature for SD.Based on the refined performance of CE-Net (i.e., h CE ) and y, the ZF equalization is formulated as where s ZF is the initial equalization feature, G ZF ∈ C N ×N denotes the ZF equalization matrix, which is given by . . . where According to (10), we obtain the superimposed data and pilot s ZF .Subsequently, we cancel the superimposed interference from pilot to obtain the coarse data s d , which is given as Then, the feature of coarse data is extracted, i.e., the coarse data s d is obtained for subsequent recovery.

D. Fusion Learning-based Symbol Detection
To refine the coarse data s d , we draw on the idea of multimodal feature-level fusion and design a lightweight FUS-Net, which fuses coarse data feature (from the simplified equalization method using equation ( 12)) and received signal.
1) Network Design: After the simplified ZF equalization, the lightweight FUS-Net is utilized to refine detection performance.Similar to the CE-Net, based on extensive experiments, the FUS-Net is composed of an input layer, a hidden layer, and an output layer.The numbers of neurons in the input layer, hidden layer and output layer are 4N , 8N , and 2N , respectively.The activation function is the same as the CE-Net [27].And a BN is also used to normalize the input sets of FUS-Net, which forms the network input as zero mean and unit variance.Tabel I summarizes the FUS-Net' architecture, as described below.
The input s in ∈ R 4N ×1 of the FUS-Net is spliced by s d and y, i.e., Next, using the FUS-Net, the output s FUS is obtained by where f FUS (•) and Λ FUS are the fusion network operation and its network parameters, respectively.
2) Training and Deployment: Similar to the CE-Net, an amount of data samples are collected to train the FUS-Net.The training details are explained as follows.
According to (13), the input of FUS-Net s in is obtained to form the real-valued fusion set { s in }.Then, the real-valued { s in } and { x d } form training sets { s in , x d } to train the FUS-Net.The details are described in Algorithm 1. Besides, a validation set is also needed.The loss function of FUS-Net is given as where S 2 denotes the number of training set for FUS-Net, β FUS is the regularization coefficient, and r is the number of layer.
Training sets { s in , x d } have 100,000 samples [20], [30]- [32], and the batch sizes are set as 80. Validation sets of the FUS-Net have 20,000 samples.The epoch of FUS-Net is set as 100.Adam optimizer [33] is used as the training optimization algorithm associated with parameters β 1 = 0.99 and β 2 = 0.999 [34].The learning rate is set as 0.001, and the L 2 regularization [35] is used for the FUS-Net.For network training, we adopt mixed SNR, i.e., the training samples are generated under 0 dB-18 dB.

E. Online Deployment
According to the trained network parameters of CE-Net and FUS-Net through offline training, the procedure of online running is described in Algorithm 1. Explanations of the Algorithm 1 are given below.
In the phase of online running, the received signal y and the known pilot x p are employed to perform the LS estimation by using Eq. ( 6).Then, the initial estimation h LS is obtained, and thus forms the network input of CE-Net (i.e., h LS ) using Eq.(7).With the network input h LS , the CE-Net refines the CE, and thus acquires the refined estimation feature h CE with real-valued form using Eq. ( 8).The complex-valued form of estimation feature, i.e., h CE , is obtained by extracting the real and imaginary parts from h CE , i.e., That is, the real part and imaginary part of h CE are composed by extracting the first N entries and the last N entries of h CE , respectively.With the estimated h CE , the ZF equalization is employed using Eq. ( 10), and thus achieves s ZF .Then, we cancel the superimposed interference to obtain the coarse data s d using Eq.(12).By utilizing Eq. ( 13), the real-valued s in is formed based on complex-valued s d and y.With the network input s in , the FUS-Net fuses the coarse data feature and received signal.Then, the FUS-Net outputs the detected symbol s FUS by using Eq. ( 14).According to Algorithm 1, the refined detection s FUS can be achieved from the proposed CE-Net and FUS-Net.By using the FUS-Net, the high precision detection s FUS is achieved.
Compared with the conventional methods, e.g., the MMSE CE and MMSE SD, the proposed method demonstrates a better detection performance, e.g., a lower BER performance.It is noteworthy that the performance of the proposed method is refined without any second-order statistic of wireless noise and channel.
Remark1: Battery Life and Spectral Efficiency Relative to those IoT systems without employing superimposed pilot and RIS, the proposed method in this paper improves the battery life of the UE and spectral efficiency for an IoT system.Due to the superimposition mode, the energy consumption of an IoT UE is significantly reduced given the same transmitted power.Besides, compared with the IoT systems without RIS, the application of RIS increases the communication reliability and thus improves the energy consumption as well for the similar communication quality.For an IoT system with limited bandwidth, the approach of using superimposed pilots in this paper effectively improves spectral efficiency.Therefore, compared with those IoT systems without employing superimposed pilot and RIS, the proposed superimposed pilot-based CE with RIS assistance significantly prolongs UE's battery life and improves the spectral efficiency of IoT systems.
The proposed method adopts superimposed pilot mode, and the UE does not need extra resources for pilot transmission.Thus, compared with CE methods of non-superimposed pilot [6], [7], the spectral efficiency is improved.Meanwhile, the energy consumption at the UE is reduced due to the fact that extra energy for pilot transmission is avoided.Table III shows the comparison of bandwidth resource occupation and energy consumption between the non-superimposed pilot-based CE method [6], [7] and the proposed method in this paper.
By denoting the energy consumption of the nonsuperimposed pilot-based CE as E NonSup , then we have where N data denotes the number of data symbols, N Pilot represents the number of pilot symbols, T 0 is the symbol duration, and P stands for the transmitted power.
Compared with the non-superimposed pilot-based CE [6], [7], the superimposed pilot mode saves the energy consumption of UE due to the fact that the extra energy consumption for pilot transmission is avoided.In this paper, the energy consumption of the proposed scheme is denoted as E Prop , which can be expressed as where λ denotes the power proportional coefficient.Then, compared with the non-superimposed pilot-based CE, the saved energy consumption by using the the proposed method is given by In terms of bandwidth resource occupation, the proposed method transmits pilot in a superimposed manner, in which the time of bandwidth occupation is N data T 0 .In contrast, the bandwidth resource occupation of non-superimposed pilot- based CE is (N data + N Pilot ) T 0 .Thus, relative to nonsuperimposed pilot-based CE, the proposed method reduces the bandwidth resource occupation, which can be given by By considering the case where N data = 32 and N Pilot = 32, it can be seen from Table III that compared with the CE method based on non-superimposed pilot, the proposed method reduces the bandwidth resource occupation and energy consumption.To sum up, compared with non-superimposed pilot-based CE, the proposed method improves the spectral and energy efficiency of RIS-assisted IoT systems.
In addition to the benefits of Remark 1, the proposed superimposed pilot-based CE with RIS assistance in this paper also reduces the computational complexity and processing delay at the BS, compared with the IoT systems without employing superimposed pilot and RIS.The specific analysis and comparison of computational complexity and processing delay at the BS are presented in Section IV.

IV. COMPLEXITY AND RUNNING TIME ANALYSES
For convenience, the simplified expression is as follows.
• "LS-CE", "MMSE-CE" and "CE-Net" are used to represent the "LS channel "MMSE channel estimation", and "proposed CE-Net", respectively.• "MMSE-CE + MMSE-SD", "CE-Net + ZF" and "proposed" are utilized to stand the "MMSE channel estimation followed by MMSE equalization", "proposed CE-Net followed by ZF equalization", and "proposed CE-Net followed by FUS-Net", respectively.1) Computational Complexity: As the most common criterion, the computational complexity of NN is described in terms of weight number and floating-point operations (FLOPs) [32].In this paper, we employ these criteria to compare the computational complexity, which is elaborated in Table II and some details are given as follows.
According to the computing method given in [32], the total NN weight number of the proposed CE-Net and FUS-Net is 28N 2 + 8N , and the total FLOPs number is 56N 2 − 8N .Thus, the computational complexity of the proposed method (including the CE-Net and FUS-Net) is 28N 2 + 8N + 56N 2 − 8N = 84N 2 .As shown in Table II, the proposed method has lower computational complexity than that of the "MMSE-CE + MMSE-SD".For the case where N = 32, i.e., case 1 in Table II, the computational complexity of the "MMSE-CE + MMSE-SD" is 200,768, whereas the computational complexity of the "proposed" is 86,016.When N = 64 (i.e., case 2 in Table II), the computational complexity of the "MMSE-CE + MMSE-SD" is 1,589,376, while the computational complexity of the "proposed" is 344,064.On the whole, compared with the "MMSE-CE + MMSE-SD", the proposed method reduces the computational complexity and thus obtains the corresponding improvement for energy-consuming.2) Running Time: The training of the proposed method is obtained on a server with Intel Xeon(R) E5-2620 CPU 2.1GHz×16, and the results are got by using MATLAB simulation on the server CPU due to the lack of a GPU solution for the "MMSE-CE + MMSE-SD".The details of running time are discussed in Fig. 3.For the case that G = 12, the total online running time of the "proposed" is about 60 seconds for the two networks, whereas that of the "MMSE-CE + MMSE-SD" is about 91 seconds.It can be seen that the online running time of the proposed method is less than that of the "MMSE-CE + MMSE-SD", prolonging the battery life of the UE as well.
Thus, compared with the "MMSE-CE + MMSE-SD", the "proposed" significantly reduces their computational complexity and running time.

V. SIMULATION RESULTS AND ANALYSIS
In this Section, numerical results of the proposed method are given.The basic parameters and definitions involved in the simulations are presented in Section V-A.Then, in Section V-B, the simulation results verify the effectiveness of the proposed method.Finally, the parameters robustness analyses are elaborated in Section V-C.

A. Parameters and Definitions
In all the experiments, unless otherwise specified, the following basic parameters are used.The pilot is Zadoff-Chu sequence [24], L = 5, N = 32, λ = 0.15, and G = 12.
The channel is generated by channel model COST2100 [29], and the transmitted data symbol is modulated by QPSK1 modulation.The signal to noise ratio (SNR) in decibel (dB) is expressed as [36] where P is the total transmitted power of superimposed data and pilot, which is equal to the sum of data power P d and pilot power P p .In these simulations, P d = 0.85P and P p = 0.15P.
The NMSE is utilized to evaluate the CE performance, which is defined as [36]

B. NMSE Performance Analysis
We validate the effectiveness of the proposed CE-Net in terms of the NMSE curves in Fig. 4. As shown in Fig. 4, the values of NMSE of "LS-CE" and "MMSE-CE" are much higher than that of the "CE-Net" for all given SNRs.For example, the NMSE of the "CE-Net" is less than 1 × 10 −2 for the case of SNR = 18 dB, while the NMSE of the "MMSE-CE" is 2 × 10 −1 and "LS-CE" is higher than 1 × 10 −0 at the same SNR.The reason of the poor performance of the "LS-CE" is that the LS estimation is sensitive to the noise and interference.The superimposed pilot is equivalent to introducing the superimposed interference, which results in an unsatisfactory LS estimation.The NMSE of "MMSE-CE" is lower than that of the "LS-CE" due to the utilization of the second-order statistical information about the channel and noise, which is at the cost of higher computational complexity.However, the NMSE of "MMSE-CE" is still unsatisfying due to the influence of superimposed interference.In contrast, the developed CE-Net effectively alleviates the impact of superimposed interference by exploiting its learning ability of de-noising (suppressing the superimposed interference and noise) and feature extraction (learning the feature of wireless channels).Thus, compared with the linear solution estimated by LS and MMSE-based CE, the developed CE-Net learns a nonlinear solution orienting the LS solution, which improves the CE's NMSE performance.

C. BER Performance Analysis
Since the pilot x p is superimposed on the modulated symbol x d , it is necessary to verify whether the superimposed interference (from the pilot) degrades the detection performance of the data symbol.In this paper, the BER is used as the metric of the detection performance and plotted in Fig. 5.We utilize "MMSE-CE+MMSE-SD" and "CE-Net+ZF" as the baseline to evaluate the effectiveness of BER for the "proposed" method.As shown in Fig. 5, the BER curve of the "proposed" is much smaller than that of the "MMSE-CE+MMSE-SD".For example, for the case where SNR = 14dB, the BER of the "proposed" is less than 1 × 10 −2 while the BER of the "MMSE-CE+MMSE-SD" is about 6.5 × 10 −2 .Furthermore, the BER of the "CE-Net+ZF" is smaller than that of the "MMSE-CE+MMSE-SD".One of the main reasons is that the poor NMSE performance of the "MMSE-CE+MMSE-SD" affects the subsequent detection performance.The error of CE is propagated to the detection stage and thus degrades the detection performance of the "MMSE-CE+MMSE-SD".
With the superior learning ability of the CE-Net, the NMSE performance of the "proposed" is improved by the CE-Net, to improve its BER performance.At the same time, we can see that the "proposed" achieves a smaller value of BER than the "CE-Net+ZF".For example, when SNR = 18 dB, the BER of the "proposed" is 1.2 × 10 −3 while the BER of the "CE-Net+ZF" reaches 5 × 10 −3 .Because there is an additional data fusion network FUS-Net in the "proposed", it is more powerful to capture additional features for SD and thus effectively improve its BER performance.

D. Robustness Analysis
In this subsection, the robustness of the "proposed" method is analysed for the impacts of varying parameters, i.e., the power proportional coefficient λ, and the number of multi-path L. For the convenience of analysis, only one impact parameter is changed, and other basic parameters remain unchanged as given in Section V-A.
1) Robustness Against λ: In general, different power proportional coefficient λ will result in different performance of CE and SD for the superimposed signals.In order to demonstrate the robustness of the "proposed" method against λ, the NMSE of CE and the BER of SD are described in Fig. 6.
From Fig. 6, as the increase of λ (increase from 0.1 to 0.2), the CE's NMSEs of "LS-CE" and "MMSE-CE" decrease.Although the decline of NMSE is not obvious, the decreasing trend is still observed.For example, when SNR = 12dB and λ changes from 0.1 to 0.2, the "MMSE-CE" changes from 3 × 10 −1 to 1.2 × 10 −1 .The likely reason is that the CE performance is improved due to the increased pilot power.Meanwhile the NMSE performance of the "proposed" remains stable and is smaller than "LS-CE" and "MMSE-CE" with the increase of λ.For example, for the case where SNR = 12dB and λ = 0.15, the values of NMSE are higher than 1 × 10 0 and 2 × 10 −1 for "LS-CE" and "MMSE-CE", respectively.By contrast, the NMSE of the "proposed" is about 1 × 10 −2 .
With the increase of λ, the BER performance of the "CE-Net+ZF" and "proposed" deteriorate slightly.For example, for the case where SNR = 18 dB and λ = 0.1, the values of BER are about 2.2 × 10 −3 and 1 × 10 −3 , respectively.While for case where SNR = 18 dB and λ = 0.2, the values of BER are about 8 × 10 −3 and 2 × 10 −3 , respectively.However, the BER of the "proposed" remains much smaller than those of "CE-Net+ZF" and "MMSE-CE+MMSE-SD" for each given SNR and λ.Thus, against the impact of λ, the "proposed" improves the BER performance when compared with the "CE-Net+ZF" and "MMSE-CE+MMSE-SD".On the whole, when compared with the "CE-Net+ZF" and "MMSE-CE+MMSE-SD", the "proposed" enhances the NMSE and BER performance against the variation of λ.
2) Robustness Against L: The performance is usually influenced by the number of multi-path, i.e., L. To illuminate the robustness against the impact of L, the performance comparison is shown in Fig. 7, where L = 3, L = 5, and L = 7 are considered.As shown in Fig. 7, the varying of NMSE is not regular with the enlargement of L. The reason is that the performance of NMSE is not so directly related to the values of L.Even so, we can see that no matter how the values of L change, using the "CE-Net" achieves the minimum value of NMSE, presenting the best NMSE performance.For example, when SNR = 12dB and L = 5, the NMSE values of "LS-CE" and "MMSE-CE" are respectively higher than 1 × 10 0 and 2 × 10 −1 , while the NMSE of "CE-Net" is about 1.2 × 10 −2 .This reflects that the CE-Net improves the NMSE performance compared with the conventional methods of "LS-CE" and "MMSE-CE" against the variations of L.
Besides, from Fig. 7, compared with the "MMSE-CE+MMSE-SD" and "CE-Net+ZF", the "proposed" achieves smaller BER for each given L. For example, for the cases of SNR = 18 dB and L = 5, BERs of the "MMSE-CE+MMSE-SD" and "CE-Net+ZF" are about 5.5 × 10 −2 and 4.5 × 10 −3 respectively, while the BER of "proposed" is smaller than 2 × 10 −3 .This reflects that the "proposed" improves the BER compared with the "MMSE-CE+MMSE-SD" and "CE-Net+ZF" against the variation of L. Besides, it is worth noting that for the case of L = 5, each of the CE methods achieves the smallest NMSE, yet they cannot achieve the best detection performance.This is because in the case of superimposed pilots, although the estimation performance is improved, the detection performance is not necessarily improved proportionally due to the influence of superimposed interference.Thus, an effective option is to make a tradeoff between NMSE performance and BER performance for the superimposed pilot-based method.
Therefore, against the impact of L, Fig. 7 shows that both of the NMSE and BER performance are improved by "proposed" when compared with the "CE-Net+ZF" and "MMSE-CE+MMSE-SD".

VI. CONCLUSION
In this paper, a superimposed pilot-based CE with RISassisted mode is proposed for IoT systems.The spectral efficiency and the energy consumption are improved by employing the superimposed pilot, and the issue of blocked propagation paths is alleviated by deploying RIS.Besides, non-NN and NN-based modes are integrated at the BS to form lightweight networks, which effectively reduce the computational complexity and processing delay.Compared with the conventional methods, the proposed solution shows its effectiveness and robustness in improving the NMSE and BER performance.In our future works, we will consider the influence of RIS materials on CE.

Fig. 1 .
Fig. 1.An illustration of RIS-assisted OFDM communication in the uplink.
Algorithm 1 Fusion learning-based CE and SD Input: Initial estimation h LS , training learning rate of CE-Net: γ 1 , training learning rate of FUS-Net: γ 2 , batch size: ν, number of gradsteps for CE-Net: G CE , number of gradsteps for FUS-Net: G FUS .Output: Refined detection s FUS .Training phase: 1: Randomly initialize the network parameters Θ CE and Θ FUS .2: Generate the training set h LS , h Label and { s in , x d }. 3: for t = 1, ..., G CE do Label as the training batch.CE by using the Adam algorithm (learning rate γ 1 ) to minimize Loss CE−Net .6: end for 7: for t = 1, ..., G FUS do 5:Update Θ8:Randomly select ν training samples from { s in , x d } as the training batch.9:

TABLE I ARCHITECTURE
OF CE-NET AND FUS-NET.

TABLE II THE
ANALYSIS OF COMPUTATIONAL COMPLEXITY.