Deep reinforcement learning based Adaptive Modulation for OFDM Underwater Acoustic Communication System

doi:10.21203/rs.3.rs-1657231/v1

Download PDF

Research

Deep reinforcement learning based Adaptive Modulation for OFDM Underwater Acoustic Communication System

https://doi.org/10.21203/rs.3.rs-1657231/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 03 Jan, 2023

Read the published version in EURASIP Journal on Advances in Signal Processing →

You are reading this latest preprint version

Due to the time-varying and space-varying characteristics of the underwater acoustic channel, the communication process may be seriously disturbed. Thus, the underwater acoustic communication system is facing the challenges of alleviating interference and improving communication quality and communication efficiency through adaptive modulation. In order to select the optimal modulation mode adaptively and maximize the system throughput ensuring that the bit error rate (BER) meets the transmission requirements, this paper introduces deep reinforcement learning (DRL) into orthogonal frequency division multiplexing (OFDM) acoustic communication system. The adaptive modulation is mapped into a Markov decision process with unknown state transition probability. Thereby, the underwater communication channel environment is regarded as the state of DRL, and the modulation mode is regarded as action. The system returns channel state information (CSI) and signal noise ratio (SNR) in every time slot through the feedback link. Because the Deep Q-Network (DQN) optimizes in the changing state space of each time slot, it is suitable for a variety of different CSI. Finally, simulations in different underwater environments (SWellEx-96) show that the proposed adaptive modulation scheme can obtain lower BER, and improve the system throughput effectively.

Underwater acoustic communication

Orthogonal Frequency Division Multiplexing

Deep reinforcement learning

Channel estimation and feedback

Channel state information

Underwater acoustic (UWA) channels are generally recognized as one of the most challenging communication channels^[1]. Considering the complexity of underwater acoustic media and the low propagation speed of sound in water, and in order to combat its characteristics of large time delay spread and large-scale fading, researchers usually set up underwater acoustic communication systems based on the channel's worst state before using adaptive transmission technology ^{[2, 3]}. By improving the transmitting power of the transmitter, using low-order modulation technology and inserting more redundant error correction coding, we can ensure that the transmission bit error rate (BER) meets the system requirements and the correct information can be successfully demodulated at the receiver. However, that leads to the low spectral efficiency of the underwater acoustic channel, along with insufficient utilization of channel capacity and low communication efficiency.

A. Related words

Orthogonal frequency division multiplexing (OFDM) has recently emerged as a more effective solutions for underwater acoustic communications because of its robustness to channels that exhibit long delay spreads and frequency selectivity ^{[4, 5]}. Besides, adaptive modulation technology (AMT) has always been a powerful method for efficient transmission. Adaptive signaling techniques have been extensively studied for radio channels ^[6-11]. Through the real-time estimation and feedback of the state of the underwater acoustic channel, it can automatically change the modulation mode, constellation size, bit rate per symbol, transmission power and so on. Only preliminary results for UWA channels are reported. A. Radosevic et al. ^[12] discussed the design of UWA communication adaptive modulation based on OFDM. They proposed two adaptive modulation schemes to maximize the system throughput under the target average BER as the design criterion. The first scheme only adjusted the modulation level and evenly distributed power among subcarriers; The second scheme adaptively adjusted the modulation level and power, then gave the effectiveness of UWA link adaptive modulation results through real-time marine experiments for the first time. In order to achieve this goal, many machine learning methods have been used in adaptive modulation schemes based upon environmental changes in recent years.

According to the different stages of the development of artificial intelligence technology, its application in underwater acoustic communication has gradually developed from classical intelligent algorithms (such as simulated annealing ant colony algorithm in cluster intelligence) to machine learning algorithms such as Q-learning in reinforcement learning^[13] and deep learning algorithms such as the deep neural network ^[9]. The relevant artificial intelligence algorithms mainly focus on the dynamic changes of the marine environment and the physical characteristics of the underwater acoustic channel.

The performance of an adaptive system depends on the transmitter’s knowledge of the channel which is provided via feedback from the receiver. Wang et al.^[14] proposed modeling the underwater acoustic adaptive transmission problem as a partially observable Markov Decision Process (MDP) for the long-term running regular point-to-point underwater acoustic communication system. The researchers developed an online algorithm based on the reinforcement learning framework to estimate the underwater acoustic channel model parameters recursively and track the underwater acoustic channel dynamics, then realize the optimal transmission parameter setting to minimize the long-term cost of the system. The test results obtained from a lake showed that the proposed method can perform better than the benchmark method of ideal non-causal CSI. Song^[15] et al. proposed an underwater acoustic adaptive modulation communication strategy based on reinforcement learning Dyna-q algorithm. The algorithm took the effective Signal-to-Noise Ratio (SNR) as the underwater acoustic channel state parameter, predicted the channel state and communication throughput according to the actual situation and simulation experience of data communication, and then used the result to select the modulation parameters combined with the channel state returned by the receiver to maximize the communication throughput. Simulation results showed that the Dyna-q algorithm can achieve higher communication throughput than the direct feedback effective SNR scheme. W. Su^[16] et al. proposed an adaptive modulation and coding scheme for underwater acoustic communication based on reinforcement learning. The scheme dynamically selected the modulation and coding strategy of underwater acoustic communication systems based on network perceived state information such as information service quality requirements, previous transmission quality and energy consumption. Pool and sea trial data showed that it improved the throughput and reduced BER with less energy consumption compared with the benchmark scheme. However, the method proposed above cannot deal with continuous channel states.

In addition, based on adaptive modulation and coding, Alamgir^[17] et al. used support vector machine, k-nearest neighbor algorithm, pseudo linear discriminant method, and enhanced regression tree method to study the classification of modulation and coding, which further improved the effect of underwater acoustic adaptive modulation and coding.

In terms of deep reinforcement learning, it has been applied more and more in wireless channels in the past two years. M. Li^[18] et al. proposed an optimization strategy based on Deep Reinforcement Learning (DRL): Using the obtained energy causality information, the strategy can adaptively allocate transmission power and adjust the M-ary modulation level, battery state, and channel gain to achieve the maximum throughput of the system. D. Lee^[8] et al. proposed an adaptive modulation scheme based on depth Q network by using MDP model. They used learning agents to select the boundary of the rate region as the state and divided the SNR range into rate regions. The simulation results show that the average spectral efficiency can improve by 0.5395bps/Hz in a wide range of SNR. Shima mashhadi^[11] et al. studied outdated Channel State Information (CSI) and proposed an adaptive reinforcement learning method based on DRL. Compared with the current CSI method which uses a (nonlinear) linear Auto-Regressive and Moving Average Model (ARMA) to predict from outdated values, the complexity of the algorithm is simplified. The numerical results of actual channel measurement show that its performance is significantly better, and it is still effective when the channel is not linear or sufficiently smooth nonlinear with time.

B. Contributions

The inherent delay Doppler double spread phenomenon of the underwater acoustic channel has space-time uncertainty, which is why there is no unified standard model of the underwater acoustic channel at present. The model uses reinforcement learning to adaptively learn the underwater acoustic channel and realize the adaptive modulation scheme, which is convenient for the parameter setting of the underwater acoustic communication system. It is the key for artificial intelligence technology to break through the bottleneck of underwater acoustic communication technology.

Our approach and contributions are the following:

The proposed underwater acoustic channel model is based on the same marine environment (SWellEx-96 experiment). By changing the position of hydrophone and receiving transducer, multiple time-varying underwater acoustic channels are generated as training and testing environment.
The time slot feedback channel is set in the OFDM communication system, which adopts hyperbolic frequency-modulated (HFM) based channel estimation algorithm. The CIS calculated by the least square algorithm and the estimated SNR are combined as the measurement standard of the channel environment.
A new method based on DRL is proposed to use in adaptive modulation scheme. The reward mechanism of reinforcement learning includes BER, spectral efficiency, maximum throughput and time consumption. When the transmission does not meet the accuracy requirements, the defined penalty is reset to zero, and non- transmission mode is set in addition to the four modulation modes.
The proposed DRL-based AM scheme is tested on BELLHOP simulation environment of the SWellEx-96 experiment. Compared with the adaptive modulation algorithm people usually use in practice (tabular method with setting fixed threshold), it demonstrates stable transmission accuracy performance and maximizes the use of channel capacity.

The paper is organized as follows: The second section introduces the experimental model of the OFDM underwater acoustic communication system, the time-varying underwater acoustic channel model and the feedback link; The third section describes the adaptive modulation scheme based on improved DRL, including environmental change setting and system state feedback as well as the reward mechanism and the algorithm process; In fourth section, MATLAB and BELLHOP simulation are used to analyze the anti-environmental interference, BER performance, maximum throughput performance, and the defined reward function value of the proposed method in underwater acoustic channel; The fifth section summarizes and looks forward to perfecting this paper.

A. Time-varying underwater acoustic channel model

In most cases, the underwater acoustic channel can be regarded as a slow time-varying coherent multi-path channel ^[19]. If the observation or processing time is not too long, we can describe the underwater acoustic channel as a time-invariant filter. However, in a continuously operating underwater acoustic communication system, the sound source and receiver's distance and position follow the hydrological changes, and the channel environment varies as well. According to this principle, the assumptions of the multi-path channel simulation model are as follows:

a) The sound velocity does not change with the horizontal direction, but only with the depth of the seawater;

b) The sea surface is a flat interface;

c) The whole ocean channel is regarded as a network system;

d) The intrinsic sound line determines the sound field.

The ray from the sound source reaches the receiving point through multiple routes, and the received sound field is the superposition result of all arriving rays (eigen-rays). Fig. 1 shows the physical model of three simple propagation paths. We did not consider the bending of the rays caused by different sound-velocity profiles, but we simulated the delay. We assume that each response amplitude is not equal, so and represent the delay difference between the second path and the first path and between the third path and the first path, respectively. The received signal is the signal superposition of the three paths.

The transmitted waveform is convoluted with the impulse response, and then the output is correlated to simulate the multi-path effect, and each arriving sound line is superimposed. Generally, in one transmission \(\tau\) Pulse signal, \(h\left(\tau ,t\right)\) represents the response obtained at a specific time of the signal in the time-varying channel, where \(t\) represents the time, and the following formula can express \(h\):

\(h\left(\tau ,t\right)=\sum _{i=1}^{N\left(t\right)}{A}_{i}\left(t\right)\delta \left(t-{\tau }_{i}\left(t\right)\right),\)

(1)

where \(i\) represents the \(i\)-th arrival sound line, and \(N\) is the number of all rays from the transmitting end to the receiving end, the number of eigen-rays. The underwater acoustic multi-path channel corresponding to this time has \(n,\) paths. \(A\) is the amplitude, \(\delta\) is the receiving phase and \(\tau\) Is the arrival delay difference of each path. \({A}_{i}\) and \({\tau }_{i}\) is the propagation attenuation coefficient and relative delay corresponding to different paths. The Fourier transform of parameter \(t\) is performed on \(H\), and the following is obtained:

\(H\left(f,t\right)=\underset{0}{\overset{+\infty }{\int }}h\left(\tau ,t\right){e}^{-j2\pi f\tau }d\tau ,\)

(2)

where \(H\) is the transfer function, and \(f\) is the frequency. In Eq. (2), when the channel is regarded as slowly changing or stable and \(t=0\), \(H\left(f,t\right)\) can be regarded as the transmission function \(H\left(f\right)\) of the time-invariant channel. When the distance between the receiver and the transmitter changes, that is, when the Doppler frequency shift occurs, the transmission function of the channel can be expressed as:

\(H\left(f,t\right)=\sum _{n=1}^{N\left(t\right)}{a}_{n}{e}^{-j2\pi f{\tau }_{n}\left(t\right)+j2\pi {f}_{n}\left(t\right)t},\)

(3)

where \({f}_{n}\left(t\right)\) indicates that the signal that arrives at the receiver is a time-varying channel, which expands the spectrum of the input signal.

Another important acoustic property of underwater acoustic channels is marine environmental noise. The marine turbulence, wind noise, and thermal noise in the underwater acoustic channel are added through empirical function, which are calculated as follows:

\({ANturb}_{dB}=17-30{\text{log}}_{10}\frac{{f}_{c}}{1000},\)	(4)
\({ANwind}_{dB}=50+7.5\sqrt{{\text{s}}_{w}}+20{\text{log}}_{10}\frac{{f}_{c}}{1000}-40{\text{log}}_{10}\left(\frac{{f}_{c}}{1000}+0.4\right),\)	(5)
\({ANthermo}_{dB}=-15+20{\text{log}}_{10}\frac{{f}_{c}}{1000},\)	(6)

whereis wind speed for ambient noise level calculation，is the center frequency of the acoustic band.

B. OFDM communication model

We consider a point-to-point underwater acoustic communication system. The transmitter can adaptively adjust the modulation mode. There is a feedback channel between the transmitter and the receiver. The receiver feeds back the CSI for each fixed time slot through the feedback channel.

Figure 2 only shows the process related to the adaptive modulation scheme in the OFDM Underwater acoustic communication system. It is assumed that the transmitted CSI will not be affected by the instability of underwater acoustic channel and system hardware equipment, which is to say that a noiseless transmission is assumed. Our goal is for the actual BER to be less than \({10}^{-2}\), and the system BER after error correction coding to be than \({10}^{-5}\). In order to simplify the environment model, this paper does not consider the error correction coding scheme but only the modulation scheme. The optimal modulation level is determined by CSI ^[20] and BER in our system to find an optimal transmission strategy for the transmitting transducer. Therefore, to realize demodulation, the sender notifies the receiver of the modulation level before transmitting data in each time slot, for the purpose of maximizing the system throughput under the specified BER requirements in a limited time range.

Channel frequency domain estimation

\(Y\left(m\right)\) is the m-th frequency domain signal received for OFDM with N subcarriers, where \(0\le m\le N-1\). \(H\left(t\right)\) is the sampling value of point \(t\) after N-point FFT transformation of channel impulse response \(h\), where \(0\le t\le N-1\). \(x\left(k\right)\) is the frequency domain signal of the k-th subcarrier transmitted, \(0\le k\le N-1\). The received frequency domain signal \(Y\) can be expressed as^[21]:

\(Y=XH+W,\)

(7)

According to

\(X=\left(\begin{array}{ccc}x\left(0\right)& \cdots & 0\\ ⋮& \ddots & ⋮\\ 0& \cdots & x\left(N-1\right)\end{array}\right),\)	(8)
\(H={\left[H\left(0\right) H\left(1\right) \cdots H\left(N-According to1\right)\right]}^{T},\)	(9)

where W is Gaussian noise. The least-squares estimation (LS) algorithm will make the cost functionthe smallest.is the estimated value ofThe optimal so lution of the cost function can be derived, and the least square solution is:

\({H}_{LS}={X}^{-1}Y.\)

(10)

Time domain impulse response \(h\) is obtained by inverse Fourier transform of \(H\).

2) SNR estimation

In the environmental assessment of the underwater acoustic channel, the SNR is an essential parameter that can effectively reflect the magnitude of noise and the environmental quality of the ring channel ^[22]. According to the received signal of each pilot subcarrier in each OFDM symbol in the system, it is expressed as:

where \(S\) is the signal power factor, \(N\) is the noise power factor, \(h\) is the channel coefficient, \(a\left(i,j\right)\) is the \(i\)-th pilot subcarrier, the modulation signal at the \(j\)-th OFDM symbol, and \(n\) is the AWGN signal with zero mean added. Then the SNR is estimated as:

\(y\left(i,j\right)=\sqrt{S}\bullet h\left(i,j\right)\bullet a\left(i,j\right)+\sqrt{N}\bullet n\left(i,j\right),\)

(11)

\({S\widehat{N}R}_{ML}=\frac{{\widehat{S}}_{ML}}{{\widehat{N}}_{ML}}=\frac{{\left[\frac{1}{J}\sum _{j}^{J-1}Re\left\{y\left(i,j\right)\bullet {\widehat{h}}^{\text{*}}\left(i,j\right)\bullet {a}^{\text{*}}\left(i,j\right)\right\}\right]}^{2}}{\frac{1}{J}\sum _{j}^{J-1}{\left|y\left(i,j\right)\right|}^{2}-{\left[\frac{1}{J}\sum _{j}^{J-1}Re\left\{y\left(i,j\right)\bullet {\widehat{h}}^{\text{*}}\left(i,j\right)\bullet {a}^{\text{*}}\left(i,j\right)\right\}\right]}^{2}},\)

(12)

where \(J\) is the number of OFDM symbols used for SNR estimation, \(Re\left\{\bullet \right\}\) represents the real part of the complex number, \(*\) represents the conjugate of the complex number and \(\widehat{h}\) is the channel time-domain impulse response estimated in the previous part.

As shown in Fig. 3, the transmission system is divided into fixed time slots. In each time slot, dynamic changes will occur between the transmitter and receiver, including the change of hydrological environment and random movement of equipment. The transmitter sends data information for channel estimation, including pilots and data packets. In each time slot, pilots are used for estimation and feedback, and the underlying data packets are transmitted continuously.

A. Multi modulation system

We map the adaptive modulation scheme to a finite Markov decision process. Based on this discrete and finite-state theoretical framework, agents and the environment achieve their goals through interactive learning. In finite MDP, function \(p\) defines the dynamic characteristics of MDP and specifies \(a\) probability distribution for the selection of each state and action.

In order to improve the bandwidth efficiency of the system, an underwater acoustic communication system usually adopts a multi-band modulation scheme. A set of signal constellation points can represent the modulation level of each data symbol \({M}_{t}\). We select modulation scheme {BPSK, QPSK, 8-QAM, and 16-QAM.}, where \({M}_{1}=\left\{\text{2,4}\right\}\) in circular constellation Multiphase Shift Keying (MPSK) system, \({M}_{2}=\left\{\text{8,16}\right\}\) in the square constellation multilevel quadrature amplitude modulation (MQAM) system. It is assumed that the transmitter uses constant symbol period \({T}_{s}\) and ideal value is combined to obtain \(M = \left\{\text{2,4},\text{8,16}\right\}\), and the length of each time slot \({T}_{s}=\frac{1}{B}\), where \(B\) is the bandwidth of the received signal.

(1) State-space: since the receiver obtains the CSI information of the feedback link, including channel gain, multi-path, noise, and other information, we define the state of each time slot as \({S}_{t}=\left\{{s}_{1},{s}_{2},{s}_{3},\dots \right\}\)。

(2) Action space: in the OFDM underwater acoustic communication system model, since the transmitter automatically adjusts and selects the modulation scheme according to only the current feedback state in each time slot, the action of each time slot is defined as \({A}_{t}=\left\{{a}_{1},{a}_{2},{a}_{3},\dots \right\}\), where the modulation mode \({a}_{i}\) adopted for the \({i}^{th}\) time slot is selected from the given constellation M system.

(3) Immediate reward function: each time slot obtains the timely reward function \({R}_{t}=\left\{{r}_{1},{r}_{2},{r}_{3},\dots \right\}\) based upon the feedback, and it includes reward and punishment. The reward function is directly proportional to the number of data bits successfully transmitted, and related to the system throughput. The punishment function is only related to the BER.

A complete MDP consists of four tuples. Given the values of the initial states and actions, the probability of the occurrence of \(s’\in S\) and \(r\in R\) at time \(t\) can be obtained \(p\left({s}^{\text{'}},r|s,a\right)={Pr}\left\{{S}_{t}=s’,{R}_{t}=r|{S}_{t-1}=s,{A}_{t-1}=a\right\}\). However, in our adaptive modulation scheme in an underwater channel environment, the selected probability distribution of each \(s\) and \(a\) cannot be obtained, so it is defined as MDP triple \(<S,A,R>\).

B. Value calculation

We are considering designing an adaptive transmission system based on SNR \({\gamma }_{t}\). If the average data rate is maximized only under the fixed target BER, then \(k\left({\gamma }_{t}\right) \text{c}\text{a}\text{n} \text{b}\text{e} \text{s}\text{e}\text{t} \text{t}\text{o} \text{e}\text{q}\text{u}\text{a}\text{l}{\text{log}}_{2}{M}_{t}\), which can meet the general adaptive M-nary modulation. The accurate BER is obtained through the actual transmission of the system. We send specific data and feedback on the bit error in each time slot and compare the proportion of the number of bits incorrectly accepted by the receiver in the total transmission bits.

Considering various expenditure loads in the communication system, we calculate the total data rate so that the coding rate \(r\) is constant. Calculated according to the number of bits per second per Hertz, the spectral efficiency is:

\(\psi =\left(r\times \sigma \right)\times \frac{T}{{T}_{b1}}\times \frac{{K}_{d}}{K},\)

(13)

where \(\sigma\) is the bit rate, \(\sigma ={\text{log}}_{2}M\) and \(M\) is the modulation order. \({T}_{b1}\) is the OFDM symbol length and \({T}_{b1}=T+{T}_{cp}\), \({T}_{cp}\) is the length of the cyclic prefix, \(T\) is the basic OFDM symbol interval, \(K\) is the number of subcarriers, the size of the symbol after FFT transformation and \({K}_{d}\) is the number of data sub-carriers.

Different from Shannon capacity, in the multi-level modulation system, we use the number of bits sent per unit time as the system throughput:

\(T=\psi \times \text{max}\left[{P}_{cf}\times \left(r\times \sigma \right)\right],\)

(14)

where \({P}_{cf}\) is related to the system BER \(\rho\), and \({P}_{cf}=1-\rho\).

The penalty \(\varTheta\) sets the reward to zero. If the BER requirements are not met, all reward values are set to 0. The calculation of the value function is defined as:

\(\left\{\begin{array}{c}r\left(s,a\right)=-{c}_{1}\bullet \rho +{c}_{2}\bullet \psi +{c}_{3}\bullet T,\varTheta =0,\rho \le {P}_{b}\\ r\left(s,a\right)=0,\varTheta =1,\rho >{P}_{b} \end{array}\right.,\)

(15)

where \({c}_{1},{c}_{2},{c}_{3}\) are constants, representing the weight of each parameter in value calculation. No signal will be transmitted if the penalty bit\(\varTheta\) of each action is set to zero in the \({t}^{th}\) time slot. It happens when the current environment state not ideal. According to our automatic modulation strategy, even the modulation mode with the lowest order cannot meet the system requirements, so the optimal strategy is not to transmit.

C. Optimization problems

According to the feedback information of the feedback line, we set the optimization problem as:

\(\begin{array}{c}\underset{{M}_{t}}{{max}}{r}_{t}\left({s}_{t},{a}_{t}\right) \\ s.t.\left\{\begin{array}{c}{M}_{t}\in M=\left\{\text{1,2},\text{4,8},16\right\},\forall t=\text{1,2},\dots ,N\\ {\rho }_{t}\le {P}_{b,t},\forall t=\text{1,2},\dots ,N \end{array}\right.\end{array},\)

(16)

where, \({M}_{t}=1\) is the action of not transmitting, and the transmitter remains in a static waiting state.

D. The proposed adaptive modulation scheme based on DRL

In the traditional Q-learning algorithm, the Q value is stored in a table. The horizontal axis of the two-dimensional table is the state, the vertical axis is the action, and the median value is the Q value of the action corresponding to each state. For the low dimensional state space, the Q table can be stored in all states, and the optimal action can be selected by directly querying the Q table. However, there is a large-scale and continuously changing state space for the time-varying underwater
acoustic channel ^[23]. Deep Q-Network (DQN) is model-free, aiming to find the mapping relationship between the action-state and the Q value. The temporal-difference (TD) method combines the Monte Carlo sampling method and the bootstrapping of the dynamic programming method (using the value function of the subsequent state to estimate the current value function) so that it can be applied to the model-free algorithm and is updated in one step with faster speed. For the Q-learning method in which action is a discrete variable, \({Q}^{\text{*}}\left(s, a\right)\) is approximated by a deep neural network. We still consider transforming the continuously changing state into the discrete state of each time slot. Because the neural network can automatically extract complex features, we do not quantify the CSI but keep the feature vector as the input.

1) Model input

Usually, people take the combination of the channel frequency domain response estimated by the receiver and the SNR of the equalized subcarrier as the input vector of the network. However, due to the sparsity of the underwater acoustic channel, we consider transforming the channel frequency-domain response estimated by the receiver into the time-domain impulse response. Through the storage method of a sparse matrix, we can effectively reduce the amount of data and denote the network input signal as \(x\):

\(x=\left[\begin{array}{cc}{h}_{sparse}& SNR\end{array}\right],\)

(17)

In order to reduce the dimension of the input data, \(n\) peaks of the time-domain impulse response are extracted in advance to keep the data input size consistent.

\({h}_{sparse}=\left[\begin{array}{cc}\begin{array}{cc}\begin{array}{c}{A}_{1}\\ {\tau }_{1}\end{array}& \begin{array}{c}{A}_{2}\\ {\tau }_{2}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}\cdots \\ \dots \end{array}& \begin{array}{c}{A}_{n}\\ {\tau }_{n}\end{array}\end{array}\end{array}\right].\)

(18)

As in Eq. (1), \({A}_{i}\) and \({\tau }_{i}\) is the amplitude and relative delay corresponding to different paths.

2) Adaptive modulation algorithm based on deep reinforcement learning

The modification of Q-learning by DRL is mainly reflected in three aspects:

(1) DRL uses a depth neural network to approximate the value function;

(2) DRL uses experience replay to train the learning process of reinforcement learning;

(3) DRL independently sets up the target network to deal with the TD deviation in the time difference algorithm.

In DRL, the enhanced learning Q-learning algorithm and the SGD deep learning training are carried out synchronously. We use the powerful fitting ability of the neural network to approximate the action-value function in Q learning, so \(Q\left(s,a\right)\approx Q\left(s,a;\theta \right)\). The update method is calculated as follows:

\(Q\left(s,a\right)\leftarrow Q\left(s,a\right)+\alpha \left[r+\gamma {max}_{{a}^{{\prime }}}Q\left({s}^{{\prime }},{a}^{{\prime }}\right)-Q\left(s,a\right)\right],\)

(19)

where \(TargetQ=r+\gamma {max}_{{a}^{{\prime }}}Q\left({s}^{{\prime }},{a}^{{\prime }}\right),\) the loss function in the algorithm uses the mean square loss to update the parameters in the iteration: \(L\left(\theta \right)=E\left[{\left(TargetQ-Q\left(s,a\right)\right)}^{2}\right]\), with \(\theta\) parameters representing the network, \(\alpha\) as the learning rate, \({s}^{{\prime }}\)and \({a}^{{\prime }}\) as the state and action in the next iteration respectively, and \(\gamma\) being the discount factor in the TD method.

In the learning phase of each time slot, we consider that passing \(\epsilon -greedy\) strategy traverses all possible operations in each channel state. The greedy algorithm generates an optimal global solution through optimal local strategy, which is usually set \(\epsilon\) as a constant in a system. A number between 0 and 1 is randomly generated \(\xi\):

\(\left\{\begin{array}{c}{a}_{t}=\text{arg}{max}_{{a}_{t}}Q\left({{s}_{t},a}_{t}\right),\xi <\epsilon \\ {a}_{t}=rand\left({A}_{t}\right),\xi \ge \epsilon \end{array}\right.,\)

(20)

The actions that maximize the Q value have a higher probability of occurrence, and the Q values corresponding to all possible actions can be learned. Therefore, we set the training state:

\(\epsilon ={max}\left(\text{0.01,0.2}-0.1\times \left(\frac{{N}_{episode}}{{N}_{0}}\right)\right),\)

(21)

where \({N}_{episode}\) is the number of episodes that the system continues to cycle and \({N}_{0}\) is a constant about episodes. \(\epsilon\) will gradually decrease with the continuous training of the current event. If the environment changes rapidly, the value of \(\epsilon\) needs to be increased to make the system more likely to train Q values corresponding to other actions in this state, which can maximize the reward of the selected action.

Each episode is carried out in each time slot for a period of time. In a certain period of time, the channel environment changes slowly. The agent judges whether the performance meets the requirements and calculates the reward value and punishment by observing the average maximum throughput and the average BER corresponding of each modulation order. The penalty value \(\varTheta\) represents the negative value of reward. In the same network, the weight after learning a task may completely change when learning a new task because the optimization target values are different in the time-varying underwater acoustic channel environment with significant differences. The objective function is the same, but the data sets are different. The old weights are easily damaged, so the batch random sampling method is adopted. We store 10000 groups of data. Each set of data contains the current environmental state information, modulation mode and value score obtained. The agent samples 100 groups of data from the experience replay, learns all samples of the whole batch, calculates the average gradient, and then updates them. During the update, only the Q value corresponding to the current modulation is updated, and others remain unchanged.

We consider the sample buffer with a sliding window mechanism. The sequence stored in the experience replay is \([{S}_{t}, {A}_{t}, {R}_{t}, {S}_{t+1}]\), the cache length is \(L=1000\), and the initialization is empty. The learned state transition sequence is added every time. When it is completed, we delete the old sample at the top layer and then store the new sample.

There are two networks with precisely the same structure but different parameters in DQN. As shown in Fig. 5, the prediction of Q estimation network mainNet is the virtual training network. Each step updates parameter θ according to the samples collected by mini-Bach, while the prediction of Q reality network targetNet parameters was used some time ago. The weight of mainNet is copied to targetNet every certain number of iterations C. \(Q\left(s,a;{\theta }_{i}\right)\) represents the output of the current network mainNet, which is used to evaluate the value function of the current state action pair; \(Q\left(s,a;{\theta }_{i}^{-}\right)\) indicates the output of targetNet. It mainly provides maxQ, which can solve the target. Therefore, when the agent acts on the environment, it can calculate Q according to the formula and update the parameters of mainNet according to the loss function. In order to prevent overestimation and make the Q value closer to its real value, our optimal action selection is based on the parameters of the Q network currently being updated. It completes an episode workout.

Specifically, the pseudo-code of using the DRL algorithm to find the best transmission strategy is shown in Algorithm 1.

Algorithm 1: DRL-Based AM Algorithm With TD Strategy

Input: CSI and SNR parameters

Output: optimal action of each time slot at\({{a}_{t}}^{\text{*}}\)

Initialize replay memory D to capacity N

Initialize state-value buffer B to capacity P

Initialize state-value Q-function with random weights θ

Initialize target state-value Q-function with weights θ− =θ

Initialize sequence S, A, and R

For episode = 1, \(M\) do

Initialize sequence \({s}_{1}=\left\{{x}_{1}\right\}\) and preprocessed sequence

For step = 1, T do

With probability, \(\epsilon\) select a random action\({a}_{t}\)

Otherwise, choose optimal action at \({a}_{t} = {max}_{a}{Q}^{\text{*}}\left({s}_{t},a;\theta \right)\) each time slot.

Evaluate BER and \(\varTheta\) after passing through the system of each rate region

Evaluate spectral efficiency and maximum throughput

Obtain reward\({r}_{t}\)

Send the information to the receiver through the feedback link in every time slot

Store transition {\({s}_{t},{a}_{t},{r}_{t},{s}_{t+1}\)} in P

Sample random mini-batch of transition from P

Set\({y}_{i}=\left\{\begin{array}{c}{r}_{j},if episode terminate at step t+1 \\ {r}_{j}+d\widehat{Q}\left({s}^{\text{'}},\underset{a}{{argmax}}\widehat{Q}\left({s}^{\text{'}},a,\theta \right);{\theta }^{-}\right),otherwise\end{array}\right.\)

Perform a gradient descent step on\({\left({y}_{t}-Q\left({s}_{t},{a}_{t};\theta \right)\right)}^{2}\) concerning\(\theta\)

Every C steps reset\({\theta }^{-} = \theta\)

End for

In this section, we present numerical and experiment results to evaluate the performance of our proposed DRL-based adaptive transmission modulation scheme. We compare its performance with the most commonly used fixed threshold method in many different hydrological environment changes, in terms of BER, system throughput, and a value function defined by this paper. Fast-changing channel state estimation and feedback are considered. The improvement of the primary look-up table method is based on quantization, and the channel environment has different SNR restrictions. The adaptive modulation schemes are BPSK, QPSK, 8-QAM, and 16-QAM.

A. Simulation environment settings

In this paper, the channel simulation uses BELLHOP software to simulate the hydrological environment. The simulation uses the sound velocity profile of the SWellEx-96 experiment about 12 kilometers away from the tip of Loma angle near San Diego, California, considering the acoustic characteristics of the sea surface, sound attenuation and absorption, and seafloor reflection loss. The sound velocity profile is shown in Fig. 6.

Firstly, we select the underwater acoustic channel scenario. Figure 7 shows the different channel responses obtained during the gradual change of the distance between the transmitter and the receiver in one channel environment. Sd is the depth of transmitter and Rd is the depth of receiver. The water depth is 200 m, the hydrophone is 50 m away from the sea surface, the transmitting transducer is 24m away from the sea surface, and the distances between the five transmitting and receiving are 5.01 km, 5.02 km, 5.03 km, 5.04 km, and 5.05 km respectively.

The sound wave emitted by the sound source is disturbed by the background environment in the propagation process of the underwater acoustic channel, including marine dynamic noise, biological noise, traffic noise, industrial noise, seismic noise and underwater noise. In addition, the location of sound source and receiver also leads to the complexity and variability of marine environmental noise.

Since each time slot of the system feeds back a set of CSI information, we assume that the channel environment is unchanged during this time slot. We consider processing the channel impulse response information of each group of time slots After estimation, we take 16 multi-path information with higher amplitude and store them in the sparse matrix. Based on this channel information, the estimated time-domain channel \(h\) and the estimated SNR are calculated to provide the current state for the adaptive modulation scheme at the transmitter.

B. BER analysis of fixed threshold algorithms

Table 1 shows the simulation parameters of OFDM system.

Table 1 Simulation parameters of the OFDM packet.

parameter	value
Cyclic prefix (CP) length/ Protection interval	400
Number of subcarriers	1600
Signal length	256×256×8
Carrier frequency	6000
Train interval	32

Under the given average channel environments, Fig. 8 gives four different modulation modes transmission BER under the different SNRs. The modulation schemes are BPSK, QPSK, 8-QAM, and 16-QAM. We define that the system requires the BER to be less than 0.01. Hence the target SNR \({P}_{b}\)is fixed According to the transmission criteria we defined, the change of each of the modulation scheme is shown in TABLE 2. When the SNR is quite low (SNR ≤ 13 dB), the lowest order modulation mode BPSK cannot meet the BER of 0.01, therefore it chooses not to transmit. When the SNR is not so low (13 dB < SNR ≤ 18 dB), the system selects QPSK as the modulation scheme, and the bit rate is twice that of the BPSK system. When the SNR is not so high (18 dB < SNR ≤ 27.5 dB, the system selects 8-QAM as the modulation scheme, the bit rate is twice higher than that of the BPSK system. When the SNR is high (SNR > 27.5 dB), the system selects 16-QAM as the modulation scheme, the data rate is four times higher than BPSK, and the BER performance remains < 0.01.

TABLE 2Modulation scheme in UNDERWATER acoustic channel.

Transmission	SNR(dB)
Model 1 (BPSK)	13 dB＜SNR≤18 dB
Model 2 (QPSK)	18 dB＜SNR≤24 dB
Model 3 (QAM-8)	24 dB＜SNR≤27.5 dB
Model 4 (QAM-16)	SNR＞27.5 dB

C. Channel estimation and feedback

HFM signal can still obtain good energy accumulation under significant Doppler frequency shift, and it has obvious ambiguity function. The ambiguity function of the HFM signal is the output of the matched filter, which has good autocorrelation and cross-correlation. ^[24] We superimpose HFM signal as the training sequence on the data signal to estimate the channel's frequency response, equalize the channel, and eliminate the interference caused by signal superposition. As shown in Fig. 9, this method can correctly estimate the multi-path information of the channel with small side lobes.

C. Training and learning based on DRL

Table 3 System network parameters.

Parameter	Value
SNR	0dB ~ 35dB
Loss function	MSE
Epoch	30×92×10
Train Parameter epochs	100
Initial learning rate	0.01
Network training function	traingdx
Optimizer	Adam

We consider using 30 different channel environments, adding different noise changes, and each episode is trained k = 30×92×10 times. During the test, we use different environments of SWelllEx-96 experiment with changing the position and noise of the transmitter and receiver, and then compare the performance of the proposed adaptive modulation system based on DRL algorithm.

Figure 10 shows the performance of the DRL method when using the changed ε-greedy algorithm to select actions. The abscissa is several episodes, and the ordinate is the sum of earnings per episode. At the beginning of network training, in the first 30 episodes, the income of each episode changes significantly. Because the value of ε in the greedy algorithm is relatively large, the network traverses all possible actions in the current state to prevent local optimization. Due to the time-varying channel, the state space of each episode process is not totally the same. However, DRL can quickly learn the relationship between state and \(Q\left(s,a\right)\) to update network parameter \(\theta\) without taking up additional storage. After training for a while, the system converges quickly, proving the fitting ability of a neural network, which proves DRL has learned the optimal strategy. After the output converges to 235, because the greedy algorithm selects the action, it still exists after 100 episodes, though the value of ε is smaller than before. Therefore, there will be suboptimal situations when implementing this strategy, resulting in small fluctuations in returns.

D. The performance of our proposed adaptive scheme is compared

Figure 11 shows the 6 different environment modulation order selection of the data set. When the SNR is quite low (SNR < 7 dB), the scheme select modulation mode BPSK. When the SNR is high (SNR > 31 dB), the scheme selects 16-QAM as the modulation scheme. In other cases, proposed adaptive scheme automatically adjust the threshold to select the modulation mode.

As seen in Fig. 12, the BER of the proposed DRL based AM scheme is compared in different channel environments. According to different SNRs, experiments show that the tabular method with fixed threshold cannot meet the requirements of BER < 0.01, and there is the possibility of non-compliance in some non-ideal conditions as high as 0.02. Under the low signal-to-noise ratio (SNR < 7 dB), even if the BPSK with the lowest modulation order is adopted, it still causes dramatical bit error, thus continuous transmission is not considered. When SNR = 17 dB, 23 dB, and 27 dB, the change of a certain state does not comply with the quantitative CSI of the table, resulting in the sudden change of BER, making the average BER close to 0.02, which does not meet the system transmission requirements defined by the paper. The proposed method has no mutation of BER in the whole transmissible SNR range. It adapts to various channel states, maintains the average performance, and demonstrates better stability.

Figure 13 shows that in a time-varying marine environment, different underwater acoustic environments generated under each signal-to-noise ratio under 0 dB < SNR < 35 dB is tested. By calculating the convergence average BER, spectral efficiency, throughput, and penalty value, the appropriate modulation mode is selected according to the system mechanism to display the average maximum system throughput obtained by the time-varying channel.

The proposed method shows better average performance in a variety of marine environments. When the SNR is so low (SNR < 7 dB) and very large (SNR > 28 dB), it can maintain the same performance as the look-up table method. No matter what CSI, the SNR is very small, BPSK is selected as the modulation mode; the SNR is large, 16-QAM modulation transmission with the highest modulation order is selected, the same reward can be obtained. In more general cases, as compared in Fig. 13:

\(C = R\bullet {P}_{cf}\bullet {{log}}_{2}\left(1+SNRdB\right),\)

(22)

where \(R=\frac{I}{B\bullet {T}_{s}\bullet {N}_{f}}\) and \({P}_{cf}={\left(1-Error\right)}^{I}\), \(I=bps\). B is the bandwidth of OFDM system. \({T}_{s}\) is OFDM symbol length. \({N}_{f}\) is number of OFDM symbols per frame.

Under the condition of signal-to-noise ratio 7 dB < SNR < 28 dB, the performance of the reward function value defined by us in defining the BER interval is shown according to Fig. 14. The reward mechanism of the adaptive modulation scheme proposed by us is calculated by the formula (15).

Comparing Fig. 13 and Fig. 14, it is found that in our OFDM underwater acoustic communication system, the transmission conditions are met in a specific SNR range, and the system throughput performance is significantly improved.

Therefore, DRL based on neural networks is more suitable for the environment with underwater state changes. The tabular method quantifies the channel state and cannot maximize the system throughput. However, the DRL method mapping state and Q-value relationships are suitable for both the old and newly changed channel environments, which has stronger robustness.

In Fig. 15, Fig. 16 and Fig. 17, the three graphs compare the performance of the proposed adaptive modulation strategy based on DRL algorithm. As the experience updates every time slot, the BERs, maximum system throughput and reward function value converged to optimal values. In every time slot, we test the environmental changes of all 30 CSI between \(SNR\in \left[0dB,35dB\right]\). In Fig. 15, the target average BER for our OFDM systems is set to \({10}^{-2}\), the nonadaptive scheme should reduce the overall throughput. The average BER of the tabula method maintains a value of about 0.0213, which of the proposed adaptive modulation strategy is just a little higher than the target. The BER decrease by 0.015. In Fig. 16, the average maximum system throughput of proposed method based on DRL increased by 4.8×10-5.

The underwater acoustic channel has a profound multi-path effect, Doppler effect, and ocean noise compared with the wireless channel. The advantages of OFDM technology are that it can better adapt to the characteristics of apparent multi-path effect, low-frequency band, and narrow bandwidth of the underwater acoustic channel. In the large time-varying underwater acoustic channel environment, the neural network is more and more widely used. In this paper, based on the OFDM underwater acoustic communication system, we propose a DRL adaptive modulation scheme. In order to improve the communication service quality, integrate the system bit error rate, and to maximize the system throughput, we define a reward function. The proposed strategy maps the state and Q value correspondence through the neural network, uses the underwater acoustic Doppler insensitive HFM signal as the pilot to estimate the channel state, and automatically selects the modulation scheme according to the real-time feedback of the time slot link. Finally, we tested the model in 30 different channel environment change experiments. The simulation shows that the proposed DRL adaptive modulation scheme has relatively lower and more stable BER performance. It can maximize the system throughput, and improve the communication service quality and communication efficiency under the condition of meeting the transmission requirements.

In the follow-up work, we will continue content of this paper. In addition to applying simulation data, we will consider using actual experimental data collection and application to verify the effectiveness of the proposed adaptive modulation scheme. Besides, when the feedback underwater acoustic channel estimated value changes rapidly in the time dimension, the channel between two adjacent time slots changes significantly. The outdated CSI will lead to significant errors, which will affect the performance of adaptive modulation. We will consider using the channel prediction algorithm based on Minimum Mean Square Error (MMSE) and machine learning and using the closer time CSI when judging the modulation mode to solve the practical application problem.

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests

The authors declare that they have no competing interests.

Funding

This work was supported by National Natural Science Foundation of China (No. 52171341, 61902431, and 61972417), science and technology project of Qingdao west coast new area under grant No. 2020-84 and the science foundation of Shandong province under grant No. ZR2020MF005.

Authors' contributions

XRC putted forward the innovation, designed ideas of the paper and substantively revised it. PHY analyzed underwater acoustic channel data, realized the verification of the algorithm and was a major contributor in writing the manuscript. JL puts forward constructive suggestions to the paper and modifies the grammar of the English paper. SBL and JHL provided data and experimental conditions. All authors read and approved the final manuscript.

Acknowledgements

Thanks are due to Jiahao Cui for grammar modification and Yu Gao for valuable discussion.

M. STOJANOVIC, J. PREISIG, Underwater acoustic communication channels: Propagation models and statistical characterization [J]. IEEE Commun. Mag. 47(1), 84–89 (2009)
M. SADEGHI, M. ELAMASSIE, UYSAL M. Adaptive OFDM-based acoustic underwater transmission: System design and experimental verification; proceedings of the 2017 IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), F 5–8 June 2017, 2017 [C]
M. HUDA, PUTRI N B, SANTOSO T B. OFDM system with adaptive modulation for shallow water acoustic channel environment; proceedings of the 2017 IEEE International Conference on Communication, Networks and Satellite (Comnetsat), F 5–7 Oct. 2017, 2017 [C]
L. WAN, H. ZHOU, X. XU et al., Adaptive Modulation and Coding for Underwater Acoustic OFDM [J]. IEEE J. Oceanic Eng. 40(2), 327–336 (2015)
M.J. BOCUS, A. DOUFEXI, AGRAFIOTIS D. Performance of OFDM-based massive MIMO OTFS systems for underwater acoustic communication [J]. IET Commun. 14(4), 588–593 (2020)
Q. JIANG, T.A.N.G.H. LEUNG V C M, QoS-Guaranteed Adaptive Modulation and Coding for Wireless Scalable Video Multicast [J]. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1696–1700 (2022)
M. WEN, J. LI, S. DANG et al., Joint-Mapping Orthogonal Frequency Division Multiplexing With Subcarrier Number Modulation [J]. IEEE Trans. Commun. 69(7), 4306–4318 (2021)
D. LEE, SUN Y G, KIM S H et al., DQN-Based Adaptive Modulation Scheme Over Wireless Communication Channels [J]. IEEE Commun. Lett. 24(6), 1289–1293 (2020)
P. JIANG, T. WANG, HAN B, et al. AI-Aided Online Adaptive OFDM Receiver: Design and Experimental Results [J]. IEEE Trans. Wireless Commun. 20(11), 7655–7668 (2021)
N. KURNIAWATI, Y.K. NINGSIH Reinforcement Learning-Based Adaptive Modulation for Vehicular Communication; proceedings of the 2021 18th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), F 19–22 May 2021, 2021 [C]
S. MASHHADI, N. GHIASI, S. FARAHMAND et al., Deep Reinforcement Learning Based Adaptive Modulation With Outdated CSI [J]. IEEE Commun. Lett. 25(10), 3291–3295 (2021)
A. RADOSEVIC, R. AHMED, DUMAN T M et al., Adaptive OFDM Modulation for Underwater Acoustic Communications: Design Considerations and Experimental Results [J]. IEEE J. Oceanic Eng. 39(2), 357–370 (2014)
H. WANG, Y. LI, QIAN J. Self-Adaptive Resource Allocation in Underwater Acoustic Interference Channel: A Reinforcement Learning Approach [J]. IEEE Internet of Things Journal 7(4), 2816–2827 (2020)
C. WANG, Z. WANG, W. SUN et al., Reinforcement Learning-Based Adaptive Transmission in Time-Varying Underwater Acoustic Channels [J]. IEEE Access. 6, 2541–2558 (2018)
Q. FU, SONG A. Adaptive Modulation for Underwater Acoustic Communications Based on Reinforcement Learning; proceedings of the OCEANS 2018 MTS/IEEE Charleston, F 22–25 Oct. 2018, 2018 [C]
W. SU, J. LIN, K. CHEN et al., Reinforcement Learning-Based Adaptive Modulation and Coding for Efficient Underwater Communications [J]. IEEE Access. 7, 67539–67550 (2019)
C.H.A.N.G.K. ALAMGIR M S M, SULTANA M N, Link Adaptation on an Underwater Communications Network Using Machine Learning Algorithms: Boosted Regression Tree Approach [J]. IEEE Access. 8, 73957–73971 (2020)
M. LI, X. ZHAO, H. LIANG et al., Deep Reinforcement Learning Optimal Transmission Policy for Communication Systems With Energy Harvesting and Adaptive MQAM [J]. IEEE Trans. Veh. Technol. 68(6), 5782–5793 (2019)
P. QARABAQI, STOJANOVIC M. Statistical Characterization and Computationally Efficient Modeling of a Class of Underwater Acoustic Communication Channels [J]. IEEE J. Oceanic Eng. 38(4), 701–717 (2013)
M. YI-JIA Z, LAN-YUE Z, JIA-XIN Modeling and estimation of the space-time varying channels; proceedings of the 2021 OES China Ocean Acoustics (COA), F 14–17 July 2021, 2021 [C]
X. FENG, WANG J F, KUAI X Y et al. Message Passing-Based Impulsive Noise Mitigation and Channel Estimation for Underwater Acoustic OFDM Communications [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2022, 71(1): pp. 611–625
P. CHEN, Y. RONG, S. NORDHOLM et al., Joint Channel Estimation and Impulsive Noise Mitigation in Underwater Acoustic OFDM Communication Systems [J]. IEEE Trans. Wireless Commun. 16(9), 6165–6178 (2017)
W. LI, J.C. PREISIG, Estimation of rapidly time-varying sparse channels [J]. IEEE J. Oceanic Eng. 32(4), 927–939 (2007)
S. ZHAO, S. YAN, XU L. Doppler Estimation Based on HFM Signal for Underwater Acoustic Time-varying Multipath Channel; proceedings of the 2019 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), F 20–22 Sept. 2019, 2019 [C]

Download PDF

Journal Publication

published 03 Jan, 2023

Read the published version in EURASIP Journal on Advances in Signal Processing →

Editorial decision: Major revision
14 Aug, 2022
Reviewers agreed at journal
05 Jun, 2022
Reviewers invited by journal
05 Jun, 2022
Reviewer #2 agreed at journal
04 Jun, 2022
Reviewer #3 agreed at journal
04 Jun, 2022
Reviewer #1 agreed at journal
04 Jun, 2022
Editor assigned by journal
31 May, 2022
Submission checks completed at journal
30 May, 2022
Editor invited by journal
30 May, 2022
First submitted to journal
14 May, 2022

You are reading this latest preprint version

Deep reinforcement learning based Adaptive Modulation for OFDM Underwater Acoustic Communication System

Status:

Journal Publication

Version 1

Abstract

Figures

I. Introduction

A. Related words

B. Contributions

Ii. System Model

Iii. Adaptive Modulation Scheme

Iv. Simulation Results And Discussion

V. Conclusion

Declarations

References

Status:

Journal Publication

Version 1