Hybrid Speech Steganography System using SS-RDWT with IPDP-MLE approach

Generally, defence applications need the high secured communication over the channel. Most of the defence-based communications are performed through the audio and speech signals in the secured environment. To provide security to these digital media and speech signals, considerably advanced technologies are implementing in the last two decades. However, digital media transmission over the secured and unsecured channels creating several big problems in defence data transfer applications due to equivalence of works, plasticity, ease of replication and unauthorized use in the digital format of data. The possible solution is to secure digital media by cover writing the speech signals by using speech steganography, so the digital media effectively hid over the speech. To overcome these problems, this article is mainly focussing on securing the speech and message through the hybrid speech steganography approach. This system is implemented with the spread spectrum-based redundant discrete wavelet transform (SS-RDWT) with additional pause removal properties. The input cover speech signal contains a larger size, and system requires the higher power usage, computational time and extra storage capacity. Thus, an intelligent pause detection protocol (IPDP) is implemented to remove the pauses and reduces the size of the speech signal. The performance of the IPDP protocol is improved by the maximum likelihood estimation (MLE) procedure. Finally, SS-RDWT speech steganography is implemented along with the IPDP-MLE-based pause removal mechanism. The simulation results show that the proposed method gives better performance against various noise attacks and improved the robustness, imperceptibility and security as compared to the conventional approaches.


Introduction
Nowadays, the security of defence communications is an important criterion as anti-social enemies are using these communication mediums and decrypting the messages from speeches. This results in the increased both direct and indirect attacks in all countries. The countries like India has threatened by various terrorist activities, and these terrorist organizations are affecting the surveillance of speech and message signals in mobile and satellite communications. Thus, the quality and security of the speech communication system need to be improved. It can be effectively archived by using watermarking, encryption and steganography systems (Dutta et al. 2019). Digital watermarking (Kumar et al. 2021) is a kind of indicating process to embed data or tag in the noise-tolerant carrier signal such as digital audio, digital speech or digital medium. The uncorrelated hidden Communicated by Suresh Chandra Satapathy.
& R. Chinna Rao rayudu.chinnarao@gmail.com information has been embedded in the carrier signal which is used to verify the integrity or authenticity of the carrier and also identify the copyright owner. In cryptography (Kumar 2017), the message is shuffled to illegible format without hiding the existence of messages. The word steganography indicates 'covered writing' and it is considered from the language which provides more security as well as capacity and maintains the quality of the cover media through the payload introduced by the algorithm. Steganography concentrates on the transmission of hidden information in a digital communication network such that the hidden information is untraceable for eavesdropper who put all effort to get through it. Both steganography and digital watermarking employed steganography techniques to embed the secret or hidden message information in noisy cover media . The intention of steganography is imperceptible to human and gives the highest priority to control the robustness. In the steganography scheme, the statistical concealment capability is the main requirement that increases the inability to differentiate the cover object from the stego object of attackers. Among those methodologies, speech steganography gives better performance as it secures both message and speech signal. So, the speech and message signal is secured over any type of channels including unsecured channels and avoids the malicious activity of various unauthorized users. In wireless communications, the channel over which speech and audio are delivered has limited bandwidth. It is, therefore, important to compress the signal before it is transmitted. To decrease the required bandwidth, speech is encoded with coders that use the properties of speech production and perception to compress the signal. In modern speech coders, the speech can be compressed approximately 10 times without significantly affecting the quality of the perceived speech. Beyond this point, the compression starts to affect the quality and to reduce the bit rate further speech coders take the activity of speech into account. Since a typical conversation alternate between two or more speakers, it is not necessary to continuously send information. By using a voice activity, detector information can be transmitted only when speech is present. In these cases, it is important to have a reliable algorithm that can detect the activity of speech. Even for good speech detectors, the reliability depends on the characteristics of the background noise. If the noise has characteristics similar to speech, the relative level difference between noise and speech is small, and the noise can easily be detected as speech. The problem is evident in the presence of non-stationary noise such as situations when multiple people are talking in the background (Pala 2016). An additional problem that arises when dealing with nonstationary noises is the presence of music. Music is typically a non-stationary signal and requires a continuous encoding not to be degraded in quality. To meet the current problems in speech steganography, this article contributed as follows: • For providing more security to the message data, PN sequence-based multiplication is used. This will reduce the attacks presented on the message data while retrieving the data from stego speech. • A new pause detection and removal algorithm are developed using the IPDP approach to reduce the bandwidth of the speech signal. This bandwidth reduction results in higher data rates for speech signal. • Then, for performing the speech steganography operation RDWT, data embedding and extraction operations are used, and it results in the maximum efficiency. • The proposed method is compared with the various states of art approaches such as FFT (Chen et al. 2015) and DWT ). The comparison result shows that the proposed hybrid approaches have robust towards various attacks and noises, respectively.
Rest of the paper is contributed as follows: Sect. 2 deals with the analysis of various existing approaches with their drawbacks. Then, Sect. 3 detailed the operation of the proposed pause detection and removal operation and the proposed method with detailed operations of embedding and extraction procedures. Then, Sect. 4 deals with the detailed analysis of simulation results with various parameters and they are compared with the state of art approaches.

Literature review
This section gives a detailed analysis of various related works for speech steganography implementation. The problems presented in each method also discussed. In (Zhijun et al. 2016) authors have presented a technique to hide the LSB detail in wavelet coefficients through integer to integer wavelet. The technique avoided the silent parts in the cover audio file to reduce the noise and used. In (Chen et al. 2015) authors used FFT-based algorithm for audio steganography. The technique modifies the amplitude of samples in the cover audio file after that applying the FFT. Hiding audio file into the audio through Hermit transform has been proposed. The method has selected the hidden audio file which has half the length of the cover audio file. In ) authors proposed a speech steganography scheme using DWT and tested the algorithm by taking several speeches and applying different attacks for robustness and imperceptibility. In (Failed 2017) authors proposed a scheme using Sudoku matrixbased approach, here the cover speech signals were divided into several non-overlap slices. The average energy values of all slices are observed and the slice which possessed the highest energy has been selected as host speech. The change in the coefficient value caused by the embedding process alters the energy condition of the stego file. To ensure the correct recovery, an algebraic manipulation has been performed in the transmitter, and hence, the detector correctly recognizes the host slice.
To increase the security level, in Xue et al. (2019) authors have proposed robust multi-level steganography, which utilizes at least two steganography methods either with the same or different types and one could be served as the carrier for the second method. The method combines LSB modification, parity encoding and DWT transformation technique and provides higher security since more type of secret message transmitted through the single cover object. In (Wen et al. 2020;Kanhe et al. 2018) authors have proposed a combined SVD and MDCT-based multiple object tracking (MOT) approach along with error-correcting codes for speech steganography to improve the embedding capacity, imperceptibility and security. In (Amiri et al. 2004) authors combined the GBT, DWT and SVD approaches. Before embedding, the secret messages had been pre-processed by using Hamming and BCH code and MOT used to identify the region of interest. Afterwards, the secret data were encoded into the DWT and DCT coefficients.
In (Kathum et al. 2016) authors have explained Lifting wavelet transform (LWT) and LSB, which used the 3rd and 4th bit for embedding process and also by using an intelligent algorithm which revises 2nd and the 5th bits corresponding to the 3rd and 4th bit, respectively, to minimize the difference between the stego file and the actual audio file. The problems that existed in the simple substitution problem in the LWT technique have been overcome by applying the genetic algorithm described. In (Banik et al. 2020a) authors presented an adaptive natural language processing (NLP) for steganography of speech to improve the capacity and imperceptibility. Here, the sample value differentiation approach used to distinguish the edge and smooth areas of cover speech and estimated the number of secret information bits to be embedded only in edges.
In Kanhe et al. 2016) authors proposed a steganography methodology to hide secret data into an acoustic file. The embedding process based on the analogue modulation principle and converted the secret information into digital form and the resulting data have been transformed into a high-frequency signal above the human audibility level. Subsequently, the signal has been mixed with the normal music signal to improve the security level. This work does not withstand concerning existing attacks like direct current, high pass filter and re-quantization attacks. In speech steganography (Liu et al. 2017a), the cover media are speech and it may be in any format, different dimension and compressed or uncompressed. It can be divided into several frames with various slicing speed per second and hence the length of the frames also varied (Tian et al. 2017). Likewise, the hidden information may be a speech or text with different formats. Both in the spatial domain, as well as transform-domain speech steganography methods, are used in this work . Initially, the speech files are separated into the number of speech frames corresponding to the speed of framing operation. Among many frames, specific frames have been considered as cover speeches and the further embedding operations are similar to either spatial or transform domain approaches (Agarwal et al. 2003). A higher number of cover speeches and their significant areas to embed hidden information results in more robustness. The payload capacity, security and robustness of steganography have been varied and its perceptibility also changed depends upon the number of bits exchanged during the embedding process (Kreuk et al. 1902).
To improve the security level of speech steganography in vocoders, in Liu et al. 2017a authors used advanced encryption standard with the steganography-based technique. In this technique, the secret information has been encrypted before applying the embedding process, and after that, the QIM modifications have been performed. Though the method is simple, the drawback of this method is the cover audio signal would be 8-times superior to the number of secret bits to be embedded and the stego file sound quality varies by the selection of audio file and the length of the message to be embedded. The multi-level secured steganography system (Li et al. 2017) is the algorithm presented, which combined two to three steganography algorithm to hide more than one secret message and provided multi-level security as well as the complexity in the decoding process. In (Gong et al. 2018) authors estimated the number of bits to be embedded in still speech with the transform domain method towards the capacity enhancement with various pitch delay models. In (Liu et al. 2016) authors developed a fast, reversible, file size preserving embedding algorithm for the speech by remapping the size of the marked variable length coding with minimum loss of quality in stego object for G.723.1. In this approach, the data embedding was done directly on bitstream of compressed speech in redundant areas for low bit-rate speech codec's.
In (Liu et al. 2017b) authors presented a spatial domain speech transformation operation that was untraceable using periodic variation in the residual signal utilizing the local linear predictor. The algorithm destroyed the periodic dependency between neighbouring samples with median filter and protects re-sampling detectors which developed to expose in scaling and rotation operation. In (Yang et al. 2019) authors explained a steganography scheme for speech to retain the identical codebook of the stego speech after embedding for iLBC vocoders. The method utilized three adjacent carrier samples for modification and reduced the probability of changes in samples to one third without pull down the embedding capacity and also concluded that the sample sequence length increment would not improve the embedding capacity. In (Banik et al. 2020b;Hu et al. 2021) authors represented a speech steganography scheme to recover the carrier completely during the extraction process. This method used side matched vector quantization algorithm for compressed code to hide the secret data with less computational complexity.

Proposed hybrid speech steganography for Defence applications
In the recent scenarios, speech steganography achieved greater success in a variety of applications. Security is the major problem in defence applications in various scenarios because the defence sector is facing many security attacks from social anti-social members. Thus, maintaining security in defence applications is a challenging task. The conventional audio steganography and encryption-based methods are failed to give the maximum performance against attacks and noise environment. Thus, to overcome this problem, the proposed method is implemented with the IPDP-MLE pause removal with SS-RDWT-based speech steganography system. This method is effectively used to transmit the messages in speech with low bit coding (reduced memory) with a pause removal mechanism. The time for embedding the secret message information into ''the speech file and the threshold in variable low bit coding is used for selecting the embedding location and embedding bits adaptively''. So, the message embedding rate and embedding capacity are increased as compared to the conventional approaches and makes the system suitable for high secured defence applications. Figure 1 demonstrates the IPDP-MLE approach for pause detection and removal from the cover speech signal. This method mainly consists of three phases of operation such as mean and root mean square error (RMS) calculation phase, MLE based on continuous speech analysis with pause occurrence estimation phase and DTX algorithmbased pause removal phase. The operation of each phase as follows:

IPDP-MLE approach
3.1.1 Phase 1 Human speech voice signal is considered as the input to the system. The speech signal pitch values will be changed continuously with time. The speech signal contains by default the white Gaussian noise properties. To remove this noise, mean (l) and standard deviation (r) calculated from the speech signal pitch levels and represented by Eqs. (1), .
Here, the input speech signal xðiÞ consisting of 1600 samples, respectively. Here, the standard deviation is calculated for the estimation of the average distance of speech from the mean. So, by using this standard deviation, average levels of pauses occurrence will be easily identified. These mean and standard deviation are also used to differentiate pause from speech. Because the frame with pause signal contains the low standard deviation compared to the speech signal standard deviation levels and repeat the procedure for the entire speech signal.

Phase 2
The pitch of speech signal varies according to the various frequencies and forms the staircase form. Thus, the maximum likelihood procedure (MLP) easily analysis the pitchbased staircase form and results in a better voice activity detection analysis. For performing this, MLP utilizes the maximum likelihood estimation (MLE) along with pitchbased stimulus selection policy. Thus, this MLE-based selection policy perfectly analysis the standard deviation and mean properties of speech signal for each pitch level. The mean and standard deviation levels are applied as input to the MLE algorithm. The MLE is majorly responsible for the calculation of pauses with their occurrences as represented in Eqs. (3).
Here, L[X, x] is a probability-based likelihood function. The probability-based speech data analysis has performed on speech through the MLE-MLP; thus, the percentage of speech and percentage of pauses are identified; it leads to calculating the threshold levels of pauses. Thus, the likelihood estimation analysis the speech continuity and path break, respectively. P [X, x] is probability density function; it is represented in Eqs. (4) as follows: Furthermore, the probability density function in the above equation can be transformed into y to obtain the likelihood function. Here, IPDP uses the Jacobi transformation method to find PDF of y. X is an estimator which is a function of x. The estimator equilibrium vector y is given in Eq. (5) as follows.
IPDP assumed that errors are normally distributed, e $ N 0; r ð Þ. These errors are normally distributed. Finally, by solving the maximum likelihood function is obtained and given in Eq. (6) as follows: 3.1.3 Phase 3 The vocoders are bandwidth is optimized by using the discontinuous transmission (DTX) algorithm as shown in Fig. 2 to eliminate the pauses effectively in mobile applications. This is an addition to voice activity detection (VAD)/VBR operation and it is a method that suspends transmission in case a pause in the normal flow of conversation when is detected in the device and background noise is stationary. DTX, also known as the silence indicator frame, is composed of the VAD and comfort noise generator (CNG) and zero-crossing rate (ZCR) algorithms. It was utilized to reduce the transmission rate during silence (unvoiced or pause) periods of speech. The purpose of VAD was to identify whether the audio being encoded is speech. Two situations are possible: • Presence of CNG algorithm: VAD conveys the proper data to the CNG algorithm; • Absence of CNG algorithm: Non-speech periods are encoded with enough bits to reproduce the background pause.
Perceptual enhancement is a set of optional post-processing which can attempt to enhance the quality of the signal and to reduce the perception of the artefacts produced by the coding/decoding process. An example of such processing is bandwidth extension. VAD is always implicitly activated when encoding in VBR. CNG algorithm allows the insertion of an artificial pause during silent intervals of speech. The main use of CNG is to generate an empty pause signal, which can exactly match the frequency of original unvoiced pause data. Thus, the elimination of actual pause can be performed easily.

SS-RDWT
The major drawback of the DWT-based approaches is its shift-variance property; thus it is not suitable for steganography applications. This problem is raised due to the downsampling nature of the DWT. This is caused due to LPF is presented in the DWT. Thus, it results in the prominent changes (reduces) in the wavelet coefficients speech samples. Hence, to overcome this problem, the SS-RDWT method is well suitable for steganography applications. The proposed SS-RDWT does not utilize the LPF and HPF in its core operation as presented in the standard DWT method as shown in Fig. 3. Thus, the SS-RDWT eliminates both upsampling and downsampling operation, so speech samples neither increased nor reduced. Consider a is downsampling factor, y[n] is the output of the speech signal, f[n] is the input speech signal, l[k] is LPF response indicates detailed coefficients, h[k] is HPF response indicates approximate coefficients, and * denotes convolution operation, respectively. The synthesis H j k ½ and analysis L j k ½ filter banks outcomes of existing DWT method are derived and presented in Eqs. (7) and (8) as follows: The inverse operation of synthesis and analysis filter banks of DWT is derived and presented in Eq. (9) as follows: Here, b is indicated as an upsampling factor. As mentioned above due to downsampling operations, numbers of speech samples are loosed. Thus, the LPF and HPF filters of SS-RDWT are presented by Eq. (10) and (11) as follows: Thus, they are neglected in the SS-RDWT operation and finally analysis (L j k ½ & H j k ½ ) and synthesis filter bank (L jþ1 k ½ ) operations of SS-RDWT are presented by Eq. (12), (13) and (14) as follows: By analyzing the above equations, it is observed that the SS-RDWT-based speech steganography results in better imperceptibility and robustness properties as compared to the standard DWT-based method. The major improvement by RDWT-based speech steganography is that it can hide the message information perfectly over the speech signal and improves the steganography capacity, respectively, with low computational complexity. The complexity is reduced because as there are no shift-invariance property and downsampling operations, it resulted in a reduced number of multiplications. The RDWT-based speech steganography is more immune to various noises and attacks also as both original and output speech samples remain the same size.
The main advantage of SS-RDWT-based speech coding is that wavelets concentrate speech signal information on a Fig. 3 Decomposition and reconstruction operations using the RDWT process few neighbouring coefficients. The remaining coefficients of the speech signal in the wavelet domain become either zero or have negligible magnitude. This facilitates better compression of the speech signal for storage/transfer. Another advantage of SS-RDWT-based speech coding is related to the psychoacoustic features of the human ear. In wavelet domain speech signal representation, SS-RDWT removes the detailed coefficients (high-frequency components) of the speech signal. Such removal of detailed coefficients is not detected by human ears. This means that the reconstructed signal at the receiver end will be perceptually similar to the original and without loss in quality. So, wavelet-based speech coding gives better results when compared with other transform coders.

Speech steganography approach
This section describes the proposed SS-RDWT approach for hybrid speech steganography system, which utilizes the pause detected audio signal as a cover speech obtained from the IPDP-MLE approach. Figure 4 illustrates the block diagram of proposed SS-RDWT-based speech steganography, the embedding process is presented in Fig. 4a, and the extraction process is presented in Fig. 4b, respectively.

Embedding process
Step 1: First read a cover speech signal, i.e. pause removed speech signal obtained from IPDP-MLE approach.
Step 2: Then apply RDWT to decompose the cover speech into approximate and detail coefficients.
Step 3: Generally, the various types of text messages are transmitted in the defence applications. They are location information, privacy files and various classified data. Input the secrete text message which is to be embedded into the cover speech. Next, convert it into binary data and it is reshaped to the 1D Vector (M i ) of length m 9 n as presented in Eq. (15) as follows: Usually, the message signal is purely in the real domain; however, the decomposed speech signal of RDWT contains both real and imaginary data. So, it is complicated to perform the steganography operation between these different types of data. Hence, it is better to hide the message information in the imaginary part of the speech signal to avoid errors and losses and to provide better imperceptibility properties.
Step 4: Now use the concept of spread spectrum, where the pseudo-noise (PN) sequence is generated with the same properties of a cover speech signal. Thus, the message information will be hidden into the speech inappropriate locations identified by the PN sequence, respectively. Here, the PN sequence considered in the range of -1 to 1 like the speech signal. Then, it generates the chip rate (cr) for converting the message to the speech rate which can also be referred to as spread-spectrum nature.
Thus, the message signal is converted into an imaginary domain as presented in Eq. (16) as follows: The PN sequence is presented in Eq. (17) and given by: It performs the modulation operation by multiplying the message data Bi with cr times and generates the spreadspectrum output signal is presented in Eq. (18) as follows: Here, b i is the bit level sampled information of the message signal.
Further, multiplication factor a is used to increase the embedding strength levels, thus it is treated as the embedding strength factor, respectively, and generates the final version of message signal w i is presented in Eq. (19) as follows Step 5: Finally, the RDWT detailed output (v i ) added with the final version of message signal w i to generate the stego speech signal (v 0 i ), respectively, as presented in Eq. (20).
Here, the addition operation takes place in a bitwise manner to obtain the efficient stego signal, which is immune to noise and shows high imperceptibility properties, as it is modulated with the accurate PN sequence with higher chip-rate and the embedding strength factor. Figure 4b illustrates the detailed procedure of the proposed extraction process using the SS-RDWT approach from the stego speech signal, where the inverse RDWT operation is applied on stego speech signal obtained from the embedding process. This process generates the sampled speech signal that does not contain any pauses. To recover the exact speech signal, apply the framing operation and add the manual pauses (delays) between each frame to generate the final recovered speech signal. RDWT generates the output as detailed and approximate coefficients. Then, perform the matrix formulation operation by using the data extraction procedure. It contains both message and speech signal properties. To separate the message and speechbased singular values, perform the data extraction operation by generating the same chirp rate cr and multiplying factor a and the mathematical equation is presented in (21).

Extraction process
For extracting the data bits from stego speech frame, analysis the non-diagonal values of the SVD matrix, respectively. From this, the lesser quantity of bit zero values is generated and more number of bit one values are generated because the embedded information in stego speech is more. Thus, to perfectly identify the message information w(n), the average of extracted non-diagonal values (M avg ) are computed initially. Then, each frame non-diagonal value is compared with the M avg and results in the w(n), respectively. The abovementioned analysis is represented in the following Eq. (22).
The generated message signal contains both real and imaginary data, by performing the imaginary to real conversions original data will be generated.

Results and discussion
This section describes the performance analysis of proposed hybrid speech steganography using SS-RDWT with IPDP-MLE approach. Several speech samples from various age groups of male and female versions are tested with proposed approach and disclosed the higher performance as contrast to existing speech steganography methods like FFT-based approach (Chen et al. 2015) and DWT-based approach ). In addition, it is also compared that SS-RDWT and SS-RDWT with IPDP-MLE approach for demonstrating the effectiveness of pause removal system in speech steganography applications.  Table 1 listed with the obtained values of eliminated pauses using IPDP-MLE technique. In addition, it is also given the values of lower and higher pitch with mean and STD of MLE, respectively. Figure 6 represents the existing speech steganography performance using FFT-based and DWT-based approaches, where it is visible from Fig. 6a that the FFT-based method has failed to produce higher imperceptibility as the stego speech completely dissimilar to the cover speech and even the reconstructed speech also not so like the cover speech. From Fig. 6b, the DWT-based approach obtained a better outcome as compared to the FFT-based method in terms of both robustness and imperceptibility properties as the stego speech and reconstructed speech looks quite similar with  few amplitude errors. Additionally, it reduces the original size of stego speech to half of its actual size due to the decimation operation of DWT. The performance of the proposed SS-RDWT is shown in Fig. 6c, where the cover speech, stego speech and reconstructed speech are of equal size and looks remarkably similar which results in higher imperceptibility and robustness properties as compared to both FFT-based and DWT-based speech steganography approaches. Table 2 demonstrate the comparison of CPU running time using existing FFT and DWT and proposed SS-RDWT approaches, which discloses that the computational complexity of the proposed SS-RDWT approach is quite less as compared to existing speech steganography approaches. The dataset used for the proposed method and conventional approaches are same, so the computational time comparison done for the same speech signals. Even though, the computation machine changes, the FFT and Fig. 6 Performance of existing speech steganography methods for sample 1. a FFT-based method (Chen et al. 2015). b DWT-based approach . c Proposed SS-RDWT approach DWT method utilizes the more computationally complex as they consisting of higher-order filters. The RDWT method is specifically designed to reduce the computational complexity of DWT by eliminating the filters. Figure 7 depicts the performance of proposed IPDP-MLE by using SS-RDWT-based speech steganography process for sample 1, where the cover speech, stego speech and reconstructed speech are of the same size and had higher imperceptibility and robustness as compared to the SS-RDWT approach without the IPDP-MLE technique. For instance, Table 2 listed the CPU running time of the proposed SS-RDWT with the IPDP-MLE approach, where the results are obtained in just 0.5844 s, which is more than 4 times lesser to SS-RDWT without the IPDP-MLE technique. The speed of operation of the proposed method was achieved because more pauses eliminated from speech as compared to the conventional approaches. So, the higher pause removal capacity is introduced the quick operation of speech steganography.
Further, Table 3 demonstrates the size and storage memory of cover speech, stego speech and reconstructed speech signals with and without the IPDP-MLE technique. As given in Table 3, both the size and storage memory obtained using the IPDP-MLE technique are quite lesser than without IPDP-MLE, which means that proposed hybrid speech steganography requires lesser storage bandwidth and achieves high-speed communication as the storage memory is very less.

Robustness against noise attack
The noise effect is the major problem and it needs to be optimized and reduced for improving the speech steganography system efficiency. Therefore, the proposed speech steganography system performance is measured in terms of extracted secrete message by adding random impulse noise to the stego speech. The FFT-based speech steganography extracted secrete messages contains the noise attacks, but the extracted message using DWT-based approach has a higher error rate, whereas the proposed SS-RDWT with IPDP-MLE approach extracted secrete message without any error rate. In addition, bit error rate and correlation coefficient are computed for both existing and proposed hybrid speech steganography approaches and demonstrated in Table 4.

Conclusion
The proposed speech steganography system can be used in various types of defence communication systems and defence applications such as secured narrowband and wideband radio systems, secured mobile and telephone communications and for transmitting the conferential information over the speech through the internet. Thus, this article was focussed on the development of a hybrid speech  steganography scheme utilizing SS-RDWT with IPDP-MLE process, where the IPDP-MLE technique employed to reduce power usage, computational time and storage capacity by removing pauses from cover speech signal. In addition, it is also provided with the scenario of speech steganography performance with and without pause removal. Further, CPU running time is computed to disclose the effectiveness of the proposed pause removalbased speech steganography scheme. Furthermore, the proposed SS-RDWT with IPDP-MLE approach tested against noise attack and performed superior as compared with existing speech steganography approaches in terms of both BER and CC metrics. Finally, extensive simulation analysis demonstrated that proposed speech steganography achieves optimal performance as compared to the conventional approaches. This work can be extended to implement high secured vocoders for defence applications. The basic property of any vocoders is to provide securitybased voice conversion. Thus, the proposed method is effectively used for providing security through removing pauses, so it will be effectively incorporated in vocoders.