CSCNN: Lightweight modulation recognition model for mobile multimedia intelligent information processing

With the advancement of the Internet of Things, the importance of multimedia intelligent information processing technology is increasing. Faced with massive electromagnetic data and limited terminal equipment computing resources, previous technologies cannot meet the real-time decisionmaking requirements for processing short-term observations or burst signals in deployable systems. In this paper, our approach proposes the Deep Complex Separable Convolution (DCSC) operation by combining separable convolution operation and complex convolution operation. At the same time, to better preserve coupling information between channels and minimize the model size, we propose the Multilevel Sepa-rable Convolutional Residual Block (MSCRB). Based on the above two methods, we constructed the Complex Separable Convolutional Neural Network (CSCNN). This neural network signiﬁcantly reduces the complexity of the deep learning model. The smallest network we constructed, CSCNN-Tiny, has a model size of 0.760M, which is only 6% of the size of MobileNet. With 0.815M Flops, it is 3.8% of MobileNet. However, it achieves a recognition accuracy of 50.97%, only 0.97% lower than MobileNet.


Introduction
In recent years, as the world has entered the information age, wireless communication technology has developed rapidly and Internet of Things devices have been widely used.Therefore, electromagnetic spectrum resources have become increasingly scarce and electromagnetic space has become more complex and crowded.The security and efficiency of mobile multimedia intelligent information processing are important prerequisites for ensuring full utilization of spectrum resources.
Communication signal modulation recognition is an important part of mobile multimedia intelligent information processing, and naturally faces greater challenges [1].In the IoT, there are numerous devices with different types and brands, each using various modulation schemes for communication.Modulation recognition technology enables analysis and identification of received wireless signals, allowing for device recognition and classification [2].IoT devices rely on wireless spectrum for communication.Modulation recognition technology assists in monitoring and identifying different modulation schemes within the spectrum.This information facilitates spectrum management, including allocation and conflict resolution [3].Wireless communication in the IoT is susceptible to interference, eavesdropping, and malicious attacks.Modulation recognition technology detects and identifies the modulation schemes of radio signals, enabling timely detection of anomalous signals or malicious activities.It enhances security monitoring and safeguards IoT communication.Modulation recognition is the process of identifying the modulation scheme of the non-cooperative signal, even when initial information is limited or there is no prior information available.As wireless communication technology continues to rapidly evolve, the scenarios and methods used for communication are becoming increasingly assorted [4].There is an increasing demand for faster signal transmission speeds and better service reliability, posing substantial obstacles to the actual use of modulation recognition technologies [5].
Facing dynamic and complex mobile multimedia devices, traditional modulation recognition technology is limited to shallow learning and relies on manual extraction of signal features, which cannot meet the needs of efficient and safe decision-making with intelligent information.
In recent years, the technology of deep learning has gained widespread attention for its excellent performances [6].Deep learning has been extensively employed, revolutionizing various critical fields, including image recognition, object detection.Moreover, deep learning has shown remarkable performance in modulation recognition by making it possible to extract and express the aspects of modulation signals.
Researchers have proposed innovative approaches to modulation recognition, such as the Contour Stellar Image (CSI) concept introduced by Lin et al. [7].To enhance the training procedure for modulation recognition, other researchers, such as Ji et al. [8] offered strategies including blind equalization-assisted deep learning networks and made use of transfer learning.
Data in deep learning is typically represented using real numbers.Recurrent neural networks (RNNs) and other foundational theories, however, reveal that complex numbers have the potential to provide greater representation capabilities than real numbers.Complex-valued representations may be easier to optimize, possess greater generalization features, learn more quickly, and provide a noise-resistant method of memory retrieval.For taking use of the phase information, Tu et al. [9] used Deep Complex Networks (DCN) in modulation recognition.Experiments show that complex neural networks perform better than real neural networks in modulation recognition tasks.
The progress of deep learning has led to an enormous increase in the accuracy of modulation recognition tasks.This advancement, however, comes at the expense of an increasing amount of network layers and a more complex model [10].Although the deep learning method is effective, the deep network model has a large amount of computation and model size.Despite the availability of GPU acceleration, deep neural networks still require a significant amount of memory space due to their largescale model parameters.This heavy reliance on high-performance hardware presents a challenge in practical applications [11].
In practical applications, a common computing architecture involves the integration of cloud and edge computing [12], as illustrated in Figure 1.This approach optimally leverages the strengths inherent in both, providing a more flexible, efficient, and rapidly responsive computing capability.Within this computational framework, certain preliminary processing tasks are delegated to edge devices [13-14], while relatively intricate computations are conducted in the cloud.This enables the effective utilization of computational resources on edge devices concurrently with harnessing the formidable computing power of the cloud.Considering the inherent limitations in performance of edge terminal devices, coupled with the challenges posed by vast amounts of electromagnetic data and complex electromagnetic environments, conventional technologies fall short in meeting the real-time decision-making requirements for deploying systems to handle short-term observations or sudden signals [15].Consequently, there is an urgent need for research focused on lightweight neural network models applicable to modulation recognition tasks [16].To deploy CNN models more effectively on edge devices, it is common practice to compress the model [17], so that the network carries fewer parameters and can simultaneously solve problems related to memory and computation speed [18].Various methods have been proposed for model compression, such as the hybrid pruning method combining weight and convolution kernel pruning [19].
These methods have varying degrees of effectiveness in compressing the model, but they can be operationally complex.Another approach is to improve the efficiency of network convolution.Gholami et al. [20] proposed SqueezeNet, which is composed of Fire modules.Han et al. [21] proposed G-Ghostnets, which demonstrated excellent performance.
In order to achieve model compression without reducing recognition accuracy and to lower the computational complexity of the model, we propose the Deep Complex Separable Convolution (DCSC) operation by combining complex convolution operation with separable convolution operation.Simultaneously, to further enhance the recognition performance of the lightweight neural network and avoid information loss caused by residual connections, we design a Multilevel Separable Convolution Residual Block (MSCRB).Based on these two methods, we have designed a novel lightweight neural network, which we refer to as the Complex Separable Convolutional Neural Network (CSCNN).
In summary, we make the following contributions in this paper: 1. We designed the DCSC operation by combining separable convolution operation with complex convolution operation.This approach enables a reduction in the number of convolutional layers without sacrificing model recognition accuracy, thereby reducing the model size and computational workload further; 2. We introduce a residual architecture, MSCRB, tailored for lightweight neural networks.This architecture alleviates the information loss caused by residual connections.It significantly improves the model's recognition performance without increasing the computational complexity and size of the model; 3. We combined MSCRB with DSCN to construct a new lightweight neural network, CSCNN.Experiments show that this network achieves significant model size reduction and reduced model computational complexity while maintaining good performance.

Related Work
In this section, we will primarily introduce the theoretical knowledge related to complex convolution operations and separable convolution operations and present the design of the classic residual block with a bottleneck structure.

Complex Convolution Operation
Complex numbers offer a superior method of expressing the relationship between amplitude and phase compared to real numbers, making them more adept at handling phase-related issues.The I/Q data in the communication signal is complex data, therefore, complex neural networks have significant advantages in the field of modulation recognition.Complex neural networks were first researched in the 1990s, and the concept of DCN has been introduced.In DCN, the most critical operation is the complex convolution operation, which effectively leverages the correlation between the real and imaginary parts of complex data, thereby enhancing the performance of the net-work.Currently, most deep learning frameworks only support real number operations.When constructing a complex network, it is necessary to construct equivalent real number operations.This part mainly introduces how to perform convolution operation by simulating complex number operation through real numbers.A complex number matrix can be defined through two real number matrices α, β: where α is defined by us as the real number part, and β is the imaginary number part.A complex number vector can be defined through two real vectors γ and δ.
(2) We can make the complex vector γ convolve with the complex matrix Z through the complex convolution operation shown in Equation 3 where * denotes the convolution operation, the final equation ( 3) can be simplified as The convolution module of the complex network will output the result of the samechannel convolution of the feature map in the real part and the imaginary part will output the cross-channel convolution result of the feature map.In this way, the phase information hidden in the input feature map can be better utilized.The process of implementing complex convolution is illustrated in Figure 2.

Separable Convolution Operation
The separable convolution operation is a convolutional operation pattern used by MobileNet [22].The MobileNet series has been widely applied in recent years.MobileNet is a lightweight convolutional neural network proposed by the Google team in 2017.As shown in Figure 3, its main feature is that the ordinary convolution operation is divided into two steps, one step is a channel-by-channel depthwise convolution, and the other step is a pointwise convolution with a filter size of 1 × 1.Therefore, the number of network parameters and computational complexity are greatly reduced, enabling real-time operation on mobile devices.The former convolves each input channel separately, while the latter mixes feature maps from different channels.At the same time, it can be used to compute the linear combination of input channels to construct new features.This approach can significantly reduce computational complexity and the number of parameters while maintaining model recognition performance.The upper part of Figure 4 shows the convolution filter used in the normal convolution operation, and the lower part shows the convolution filter used by depthwise and pointwise in the MobileNet.The separable operation is to split the ordinary convolution into two convolution operations.Where K s represents the filter size and K c represents the number of channels of the output.K n represents the number of filters.We will use t 1 to represent the computational complexity of traditional convolution and t 1 to represent the computational complexity of separable convolution.F s is the size of the feature map.The computational complexity of traditional convolution: The computational complexity of separable convolution The number of parameters in separable convolution The ratio of separable convolution computation to traditional convolution computation: When the number of channels of the output vector is large, the ratio of the two convolution parameters is approximately inversely proportional to the square of the convolution kernel size.When the convolution kernel size is 3 × 3, using depthwise separable convolution can reduce the number of parameters to at most 1/9 of the original.
Experimental results show that MobileNet series greatly shortens the training time and reduces the computational cost of parameter updates while ensuring stable image classification accuracy as much as possible, providing direction for optimizing subsequent network structures.However, there are still some limitations in the MobileNet structure, such as insufficient feature information extraction leading to low classification accuracy and the phenomenon of losing feature information in the activation function of network layers.

Classic Residual Block
The bottleneck structure, originally introduced in ResNet, consists of three convolutional layers: a 1 × 1 convolution for channel reduction, a 3 × 3 convolution for spatial feature extraction, and another 1 × 1 convolution for channel expansion.Residual networks are typically constructed by stacking multiple such residual blocks.Figure 5 illustrates the architecture of ResNet.The bottleneck structure has undergone further refinement in subsequent research, such as the expansion of channels in each convolutional layer, the application of group convolutions to the central bottleneck convolution for more expressive feature representations, and the introduction of attention-based modules to explicitly model inter-dependencies between channels.Certain studies integrate residual blocks with dense connections to boost performance [23].Although the residual structure performs well in various tasks, it is rarely used in lightweight networks due to the increased model complexity [24].
We will now explain the principles behind how residual networks address the is-sue of gradient degradation.
Where x L represents a convolutional layer, x l represents the input layer of the shortcut in the residual.In backpropagation, the gradient formula for a residual network, after taking derivatives, is as follows: In gradient updates within residual networks, an additional constant 1 is introduced, mitigating the vanishing gradient problem.

Our Approach
Our approach mainly utilizes DCSC and MSCRB to construct a lightweight neural network -CSCNN for communication signal modulation recognition.

Deep Complex Separable Convolution
Inspired by the MobileNet and DCN, we proposed DCSC.This convolution operation separates the real and imaginary parts in feature maps and convolution kernels in the original complex convolution.It then recombines the real and imaginary parts of the same layer, simulating separable convolution operation.This approach integrates deep complex operations with separable convolution operation in a highly effective manner, ensuring the recognition accuracy of the network while greatly reducing its size.
In the first step of DCSN, with the number of filters set to 1, we divide the 2Mchannel feature maps in complex convolution operation into M groups, where the i-th group consists of the real part feature maps from the i-th layer and the imaginary part feature maps from the i-th layer.The 2M-channel filters are also divided into two groups, where the i-th group consists of the filters of the real part layer and the filters of the imaginary part layer, we recombine the real and imaginary parts of the same layer in the feature maps and filters.Then, we perform deep complex convolution operations between these two groups of feature maps and filters.Finally, the resulting M groups of feature maps are concatenated.In the second step, we set the filter size to 1 × 1 and perform normal complex convolution operations.By performing these two steps, the separable complex convolution operation in DCSC is completed.At the same time, to avoid convergence issues in complex network convolution computations, we perform batch normalization and ReLU activation after each complex convolution operation.And then the DCSC is completed.Figure 6 displays the entire process of DCSC. Figure 7

Multilevel Separable Convolution Residual Block
Due to the ReLU activation function exhibiting a gradient of 0 at the value of 0, a substantial proportion of depthwise convolution weights tends to be zeroed out during the training phase in separable convolutional neural networks.This phenomenon results in subsequent iterations being incapable of reactivating the corresponding neuron nodes.The application of the residual architecture in ResNet significantly alleviates the issue of feature degradation.However, the residual blocks in ResNet exhibit higher computational complexity, making them less suitable for lightweight neural networks.In response to this challenge, we have devised the MSCRB specifically for light-weight neural networks.
Our idea is to replace common convolutions in the fundamental blocks with separable convolutions.In MSCRB, we initiate the process with a separable convolution with downsampling.Subsequently, standard convolution operations with dimensionality expansion are performed.The inclusion of a shortcut connection to high-dimensional representations in this design enables the network to preserve more in-formation from the lower layers during gradient propagation towards the upper layers, thereby augmenting cross-layer gradient propagation.Finally, a second round of separable convolution is executed.
Within the building block, we conduct separable convolutions twice, the dimension is reduced first, and standard convolutions are inserted in between the two separable convolutions to expand the dimension.This approach leads to a substantial reduction in both parameters and computational costs.Figure 8 illustrates the architecture of MSCRB.The architecture retains additional information exchanged between blocks, facilitating improved optimization of network training through the utilization of high-dimensional residuals.Moreover, for enhanced spatial representation, instead of placing spatial convolutions in the bottleneck with compressed channels, we suggest applying them in the expanded high-dimensional feature space to enhance model performance.Additionally, we utilize pointwise convolutions for the channel reduction and expansion process, aiming to maximize the reduction in computational costs.

Complex Separable Convolutional Neural Network
By leveraging the DCSC and MSCRB, we combine to form the Complex MSCRBlock as illustrated in Figure 9. Through multiple instances of Complex MSCRBlock, we design our lightweight neural network architecture -CSCNN.The network initiates with a normal complex convolutional layer.Subsequently, we add our residual blocks.
The last building block's output undergoes a global average pooling layer, converting 2D feature maps into 1D feature vectors.Then, a fully connected layer is added to predict the final accuracy for eleven modulation categories.
We can build CSCNN models with different model sizes and performances by stacking Complex MSCRB with different parameters, where the adjustable parameters of Complex MSCRB mainly include the stride and the ratio between the original layer and the intermediate layer after downsampling.To enhance the generalization capability of CSCNN for deployment on multiple hardware platforms, we designed CSCNN models with four different levels of complexity, namely CSCNN-Large, CSCNN-Middle, CSCNN-Small, and CSCNN-Tiny.Table 1  the CSCNN-Large.The default down sampling ratio in the middle of our residual blocks is set to 6.In the process of network design, the I-way and Q-way of the signal are connected through a complex convolutional network, and the coupling features between the Iway and Q-way are extracted to improve the recognition accuracy.Then, considering the amount of network parameters and computational complexity, a small convolution kernel is used to perform multi-layer convolution to reduce model parameters.Furthermore, multiple nonlinear activation layers are integrated to replace a single nonlinear activation layer to enhance the discriminative ability.In the process of feature extraction, the use of small convolution kernels will bring problems of insufficient capability and insufficient field of view, so the method of increasing the number of convolutional neural network layers is used to make up for these problems.

Experiment and Dataset
In this section, we mainly introduce the experimental details and dataset used in the experiment.

Experiment Setup
We conducted experiments under Python 3.8.10 and PyTorch v1.8.The CPU is the Core i5 processor produced by Intel.We use Nvidia RTX 4070 G graphics card with 12G memory to train the model.And graphics card configuration adopts CUDA 12.1, CuDNN 8.9.4.

Experiment Dataset
The dataset for this experiment is RadioML2016.10a[25], a communication signal dataset specially provided for machine learning based on the GNU Radio platform simulation implementation.GNU Radio is an open-source software platform, which can provide various signal modules to build radio communication systems, for example, signal generation modules, modulation and demodulation modules, filter modules, communication channel modules, which are convenient for simulating various modulation signals.Although the dataset is generated using a simulation platform, the generated signal is very close to the real scene.Timthy J O'Shea explained in the literature that the real speech signal is used in the process of generating the public dataset, and the GNU Radio open source software simulates the real channel scene, which involves many parameters, such as center frequency shift, multipath fading, channel noise, sampling frequency deviation, etc., using the real signal captured in the air to pass through the random channel, and then The obtained output data is sampled at random time again, and the final output result is saved in the vector.

Experiment Details
We divide the dataset in a 3:1:1 ratio during the algorithm research phase.To avoid bias induced by imbalanced samples, the signal samples of different modulation schemes are randomly sampled according to the ratio under varied signal-noise ratio (SNR) situations.During the actual experimental process, we found that all models could converge within 23 epochs on the RML2016 dataset.Therefore, we trained each model for 25 epochs and recorded the loss values and accuracy during the training process.We evaluate the model's performance using four metrics.The metrics are model size, model parameters, Flops, and accuracy.We calculate the model size, model parameters and Flops using the Thop library.

Ablation Studies
This section mainly discusses the effectiveness of the proposed method.we use AlexNet and MobileNet as baseline experiments, we have designed three improved experiments.
(1): We replaced the separable convolution operations in MobileNet with DCSC to construct the network C-MobileNet for experimentation, thereby validating the effectiveness of our proposed DCSC method.(2): We used MSCRB as the basic block to construct MSCRB-Net.This was done to verify the effectiveness of the novel residual architecture.(3): we conducted experiments using CSCNN.
As shown in Figure 10, the recognition accuracy and training loss values of different network models across various training epochs are demonstrated.Table 2 presents the metrics of each network.
Importance of resdual block.MSCRB alleviates the issue of information loss during the propagation of network information, with MSCRB-Net achieving a 4.77% higher recognition accuracy compared to MobileNet.Effect of using DCSC.DCSC enhances the correlation between the real and imaginary parts of complex data, maintaining consistent recognition rates while reducing model size and FLOPs by 20% compared to MobileNet.
Superiority of the CSCNN.CSCNN exhibits a large reduction in model size and computational complexity, while still achieving a recognition performance 0.46% higher than AlexNet, 4.77% higher than MobileNet.

Performance of Our Nets
In this section, we experimentally evaluate all the networks we constructed, demonstrating the various networks designed for different hardware specifications.
We constructed four different scales of MSCRB-Net as we constructed for CSCNN, namely, Net-Large, Net-Middle, Net-Small, Net-Tiny.We performed trials on networks we constructed, including eight networks based on MSCRB-Net and CSCNN, along with C-MobileNet.
Figure 11 shows the training loss and accuracy of the network we constructed on the RML2016 dataset.Table 3 presents the performance of the networks we constructed.The smallest network, CSCNN-Tiny, is only 0.760M, which is just 6% of the size of MobileNet.With Flops of 0.815M, it is 3.8% of the MobileNet.However, it achieves a recognition accuracy of 50.97%,only 0.97% lower than MobileNet.The largest network we constructed, CSCNN-Large, exhibits a 65% reduction in model size and a 25% reduction in Flops compared to AlexNet.However, it achieves a 0.46% improvement in accuracy.The four networks we constructed, each exhibit a reduction in computational complexity and model size by approximately 2-4 times compared to the preceding network, with a slight decrease of around 1% in recognition accuracy.This makes them better suited for a variety of hardware devices with different specifications.
The scatter plot in Figure 12 displays the performance of all experimental networks, further illustrating the superiority of the networks we constructed.

Conclusions
This paper proposes DCSC based on separable convolution operation and complex convolution operation.Furthermore, this paper introduces a novel residual structure, MSCRB, to combine with DCSC in constructing a new lightweight neural network -CSCNN.The network absorbs the advantages of DCN and MobileNet, which can ensure the recognition accuracy while compressing the model size.It is anticipated that signal modulation recognition will continue to be further improved in the future, particularly as deep learning technology advances.Researchers will likely explore additional methods that strike a balance between model accuracy and lightweight implementation.This could involve developing more efficient network architectures,  exploring transfer learning techniques, or leveraging advancements in hardware acceleration.As these advances occur, the field of multimedia intelligent information processing is expected to make significant progress in achieving high accuracy while maintaining computational efficiency.

Fig. 1
Fig.1The combined computing architecture of cloud computing and edge computing.

Fig. 11
Fig. 11 Training accuracy and loss on our Nets.

Table 1
Architecture details of CSCNN-Large.

Table 2
Model performance.

Table 3
Model performance on our Nets.