Modulation format recognition using CNN-based transfer learning models

Transfer learning (TL) appears to be a potential method for transferring information from general to specialized activities. Unfortunately, experimenting using various TL models does not yield good results. In this paper, we propose a model built from scratch with the Hough transform (HT) of constellation diagrams to improve modulation format recognition. The HT is utilized to project points on the constellation diagrams on the Hough space. The HT translates constellation diagram points into lines. Features can then be extracted from the HT domain. Constellation diagrams for eight different modulation formats (2/4/8/16—PSK and 8/16/32/64—QAM) are obtained at optical signal-to-noise ratios (OSNRs) ranging from 5 to 30 dB. The proposed system is based on classification and TL. The obtained results indicate that even at low OSNR values, the proposed system can blindly recognize the wireless optical modulation format with a classification accuracy of up to 99%.


3
343 Page 2 of 40 the spectral efficiency (Tan et al. 2014). Blind modulation format recognition (MFR) techniques can help with this. As a result, determining the modulation format naively at the receiver side eliminates the sending and receiving of handshaking data (Eldemerdash et al. 2016).
Intelligent coherent receivers capable of completing the MFR job will be required in the future generation of optical communication systems (Musumeci et al. 2018). In optical communication systems, machine learning and deep learning have been used to select modulation format, bit rate, OSNR, and other optical performance monitoring measures (Martn et al. 2019). MFR tasks come in two types: with and without channel information.
In the lack of channel information, the MFR is blind (Adles et al. 2014).
In the literature, there have been several past trends for blind optical MFR. In (Zibar et al. 2015), a method based on histograms derived from electric field distributions was presented. Although it has a high classification rate, the computations are complicated. Liu et al. presented a method for MFR that coupled nonlinear power transformation with a received signal peak identification algorithm (Liu et al. 2017). Although this method achieves excellent good identification accuracy levels, it does so at the expense of a large number of samples per each modulation format.
For the amplitude histograms produced from the constellation diagrams, an approach for MFR was presented employing artificial neural networks (ANNs) (Khan et al. 2012) and deep neural networks (DNNs) (Khan et al. 2016). Because of the similarities of their amplitude histograms, neither of these approaches could distinguish the higher levels of phase-shift keying (PSK) modulation formats. Jiang et al. proposed a blind MFR technique based on quick density-peak-based pattern recognition in the 2D Stokes space (Jiang et al. 2018). Furthermore, for the constellation diagrams, the MFR system relies on the use of CNNs .
TL is the process of improving learning in a new activity by transferring knowledge from a previously learned one (Krizhevsky et al. 2012). Even though most machine learning systems are built to perform a specific purpose, the development of algorithms that aid TL is a major problem in the area. It is a performance-enhancing optimization or a time-saving shortcut. The TL refers to the process of transferring data from a large database to aid learning applications on a smaller but similar dataset. The TL can be used to solve problems related to predictive modeling.
With the fast growth of artificial intelligence technology, TL theory is quickly being applied to the field of modulation recognition (Roseline et al. 2019). This section introduces an investigation of optical wireless MFR employing TL techniques based on the HT. For different modulation formats, multiple classifiers based on CNN are utilized to recognize constellation diagrams (2/4/8/16 PSK and 8/16/32/64 QAM). For improved pattern classification, the HT is used to transfer the constellation diagram patterns into another space. To assess the performance of the proposed system, the effectiveness of these classifiers is investigated at various levels of OSNR. The obtained results show that even at low OSNR values, the proposed system can blindly recognize the wireless optical modulation format with a classification accuracy up to 100%.
The rest of the paper is organized as follows. Section 2 presents some fundamentals of the pre-trained CNN models. The proposed MFR system based on TL is explained in Sect. 3. Section 4 presents the experimental results and comparative analysis. Finally, Sect. 5 presents the concluding remarks. The term "TL" refers to the process of transferring CNN parameters from one recognition job to another using a different image database (El-Hag et al. 2021). Almost all deep CNN models trained and learned on natural digital images have the same issue. They use their first CNN layers to grasp and locate the basic features of the input images, even if these features are not unique to a job or dataset. However, they may be used for a wide range of classification tasks and image datasets. As a result, the TL can be an effective choice for preventing overfitting when, the target database is much smaller than the original one.
Only a few CNN models have been trained on actual images. These models include VGG16, AlexNet, DarkNet-53, DenseNet201, Inception-V3, Places365-GoogleNet, ResNet50, and MobileNetV2. On the ImageNet database, these pre-trained CNN models have been used to detect generic objects and extract the key properties of the modulation constellation diagrams (Khademi et al. 2022). As a result, these models can be retrained and tested for modulation patterns. This is a major advantage of the TL approach. As a result, TL-based CNN architectures have lately been popular in the modulation classification research.
The Dense Revolutionary Network (DenseNet) has been proposed (Hemalatha et al. 2021). It links each layer in a feed-forward way with the other. Generally, conventional L-layer convolution networks have L(L+1)/2 direct links. The feature maps of the previous layers are utilized as inputs for subsequent layers. DenseNets have many persuasive merits. They mitigate the issue of diminishing gradients, reinforce the propagation of features, enable the reprocessing of extracted features, and dramatically reduce the number of parameters. The input images must be resized to be of size 224×224×3.
ResNet-50 model is one of the deep residual networks (Rezende et al. 2017), and it explicitly reformulates layers as residual learning functions, referring to layer inputs instead of unreferenced learning functions. The ResNet-50 architecture has 50 weight layers, while the VGG network, which was another popular architecture at the time of the ResNet publication, typically has around 19-16 weight layers. The residual connections in ResNet allow for the network to be trained much deeper than traditional architectures without suffering from the problem of vanishing gradients. However, to be precise, the ResNet-50 architecture has 152 convolutional layers and not just 50 weight layers. The depth of the network is usually defined by the number of convolutional layers, not counting pooling or fully-connected layers, which are also present in the ResNet-50 architecture.
The Xception model presented by (Szegedy et al. 2016) is a convolutional 71-layer deep neural network. This is an enhanced form of the Inception model, and it includes convolutions that can be differentiated by distance. Xception, more specifically, replaces regular Inception units with depth-distinguishable convolutions. This demonstrates excellent outcomes in traditional classification problems as opposed to VGG16, ResNet, and Inception. Xception requires 299×299×3 image input dimensions.

Proposed MFR-based TL system
In digital communication applications, accurate MFR is a must. Instead of using traditional feature-extraction-based machine learning (ML) techniques for MFR, this paper introduces an efficient MFR model based on deep learning (DL). It is a model built from scratch, like that used in malware detection ).
The proposed model consists of five convolution layers. Each layer contains a number of filters (12, 12, 24, 24, and 24), and they all share a kernel size of 3, a stride of 1, and a padding of 1. In addition, a BatchNorm2d layer is used after each convolution layer, except for the last layer, with a total of four BatchNorm2d layers. The 2×2 Maxpool is used to select the most effective features. The proposed model ends with a fullyconnected layer containing 98304 neurons in input and 8 neurons in output representing 8 classes. The ReLU activation function is used with each convolution layer, and the LogSoftmax is used in the fully-connected layer. The specifications of the proposed model are illustrated in Table 1.
As a result, several deep-tuned CNN-based TL models are built and employed to effectively identify modulation constellation diagrams with less computations to achieve a maximum identification accuracy, as illustrated in the proposed MFR system in Fig. 1.

Dataset description
A total number of 12480 different modulation format images have been obtained to precisely evaluate the performance of the proposed system. The collection includes BPSK,

Pre-processing phase
The constellation diagram is a color image in PNG format with a pixel size of 454×454. First, this image is binarized to a gray-scale image to reduce the computational load, and then an edge detection (ED) technique is used. The ED is a type of image processing that depends on mathematical algorithms to detect the edges of objects in images (Varun et al. 2015). It determines areas in a digital image, where the image intensity rapidly varies.
Edges are a sequence of curved segments that connect the areas of radically varying intensities. A dilation method is used on the produced edges to increase the distinction of edges from the surrounding areas. The basic principle of HT is illustrated in Fig. 2.
In digital image processing, image analysis, and computer vision, HT is used as a transformation method (Gioi et al. 2008). The HT is a feature extractor that highlights the most important features. Figure 3 depicts examples of constellation diagrams generated for each The basic idea of HT is to convert a pixel at (x, y) into a line that passes through a parameter space point at (m, c). Each dot indicates a verified line assumption. The following is the mathematical description of the line: where x and y are coordinate points, and m and c represent the line slope and y-intercept, respectively. Hart and Duda adopted HT with a specified parameter space for a more robust computational method and improved line detection (Zeng et al. 2012).
The HT function returns H, a parameter space matrix. The following is a polar representation of the line equation (Tsai and Chang 2013): The range of the angel is (−90 • ≤ < 90 • ).

Performance evaluation metrics
A study of the HT effect on the model is undertaken to assess the proposed model and TL models utilized in conducting classification of the datasets (8-modulation kinds) and to determine the modulation format. The dataset is partitioned into 75 % for training and 25 % for testing to determine the model strength. The proposed model output is based on feature learning classification, compared to automated feature extraction methods. Accuracy, loss, precision (Ni et al. 2018), F1-score (Namanya et al. 2020), confusion matrix, precision and recall curve, and ROC curve were used to assess the model. The following are the metric parameters: where the true positive is TP, the true negative is TN, the false positive is FP, and the false negative is FN.

Result analysis
Different CNN models (EfficientNet b0, EfficientNet b7, ResNet-50, ResNet101_32x8d, DenseNet121, MobileNet V2, Xception and the proposed model that is built from scratch) are evaluated on the constellation diagrams presented in Fig. 3 in the simulation tests. As previously stated, the dataset is split into 75 % for training and 25% for testing. Through the training and classification procedures, the google Colabetory (Colab) service is used for performance analysis of the simulation tests in terms of the confusion matrix, loss and accuracy curves, in addition to other distinct evaluation metrics. With Google Colab pro, we have access to fast GPUs, such as T4 or P100, and a highmemory virtual machine with 25 GB of the available RAM. Moreover, all models have been trained on the same Google Colab pro service. The conditional training method was also used. In his method, the best weights of the generator models are stored only when the best results are obtained, when evaluating the model in each batch.
Accuracy and loss curves, confusion matrix, and other derived evaluation metrics are provided in detail to enable thorough comparisons among the CNN models for ease of presentation. The hyperparameters were also selected as follows. Adam optimizer was used with a learning rate of 0.0001. In addition, cross entropy is adopted as a loss function with 16 images assigned as a batch size. Transformations were also randomly applied to the training images before entering the training phase in each epoch. Transformations are used to keep the proposed models in a state of constant training, reducing overfitting. All models were trained with the same hyperparameters using the PyTorch library in its latest version with Python 3.8. Figures 4, 5, 6, 7, 8, and 9 show the accuracy and loss curves of the seven CNN model's training and testing and the confusion matrices with and without the HT. These figures show that the loss and accuracy curves are both steady. Furthermore, because the recommended CNN model training and testing curves are almost similar, there is no overfitting of the training data. Therefore, by using fewer epochs, the recommended CNN model beats the competition. Similarly, the accuracy and loss curves of the seven deep-tuned CNN models are similar.

Result analysis without HT
Figures 4 and 5 demonstrate the accuracy and loss curves of the deep-tuned pre-trained model training procedure throughout 20 epochs without HT. The loss and accuracy curves remain stable before 10 epochs, as seen from these graphs.
Furthermore, because the training curves for the proposed deep-tuned models are identical, there is no overfitting of the training data. As a result of employing less epochs, the proposed deep-tuned CNN models performed better. Figure 6 shows the confusion matrix derived for the seven CNN models. This is a multiclassification confusion matrix for the 8 modulation formats examined (2/4/8/16 PSK and 8/16/32/64 QAM). The obtained findings support the low mis-classification rate, with nearly all modulation formats properly categorized with high accuracy.
With and without the HT, the performance and results of the proposed MFR system employing CNN and TL were examined to show how the HT influences the outcomes. As demonstrated in Tables 1 and 2

Result analysis with HT
Figures 7 and 8 demonstrate the accuracy and loss curves of the deep-tuned full-trained model training procedure throughout 20 epochs with the HT. The loss and accuracy curves remain stable before 10 epochs, as seen from these graphs. Furthermore, because the training curves for the proposed deep-tuned models are identical, there is no overfitting Page 9 of 40 343 of the training data. As a result of employing less epochs, the proposed deep-tuned CNN models performed better. Figure 9 depicts the confusion matrix derived for the seven CNN models. This is a multi-classification confusion matrix for the 8 modulation modes examined (2/4/8/16 PSK The projection of the constellation diagrams on the polar space using HT is implemented to extract additional characteristics from the constellation diagrams. As a result, the HT has some merits that improve the model performance, as indicated in   In general, the usage of HT aids in the improvement of the model efficiency and performance. The accuracy, loss, precision, recall, F1-score, log loss, confusion matrix,

Comparison results
Tables 2 and 3 illustrate the performance evaluation metrics among the eight deeptuned CNN models without and with the HT, respectively. Accuracy, loss, precision, recall, specificity, F1-score, and training time are the metrics adopted to assess the performance of the adopted CNN models. All used CNN models produce outstanding classification results. As a result, these models may be used efficiently for identifying modulation formats with HT.  Although the Xception model outperforms the other CNN models in terms of most utilized metrics. It takes much longer time to train than the others, like EfficientNet b0 and ResNet 50. The model built from scratch has the lowest training time among all other classification models. It also has the least trainable parameters for classification tasks. The DenseNet121 model has the lowest loss value, and the highest Specificity among the studied CNN-based TL models. The proposed model built from scratch is distinguished by simplicity in structure and achieving the best results that are close to

Classification complexity
Figures 10 and 11 demonstrate the training time and accuracy for the eight deep-tuned models with and without the HT, respectively. In general, all employed deep-tuned CNN models achieved excellent classification results. Additionally, the results reveal that the Xception model outperforms the other deep-tuned CNN models in almost all test metrics. Although the EfficientNet b7 and ResNet101_32x8d models achieved more than 98% classification accuracy, they consume 654 and 668 seconds per epoch, respectively. The model built from scratch has the lowest number of trainable parameters and has the lowest training time among the employed models. The DenseNet121 and Xception deep-tuned models have very high classification accuracy among all other models, reaching almost 99%.
In all evaluation metrics, the proposed MFR system outperforms all existing standard ones. The proposed system classification accuracy increased to 99.01% with Xception model and HT. Xception is superior to standard classifiers. This is attributed to the usage of the HT and the CNN-based TL models. The proposed MFR system with HT achieves almost the same accuracy level as that of the Xception, with the lowest training time and the least trainable parameters.

Comparative analysis
Many classic ML and DL algorithms have been used to increase classification performance and accuracy. As a result, this section pits the proposed MFR system against several wellknown ML and DL classifiers in order to demonstrate its ability to recognize and classify modulation formats. Table 4 gives a comparison of the overall classification accuracy and computation complexity, including total trainable parameters and training times. The findings show that the MFR system outperforms all other traditional techniques in terms of all assessment metrics. The proposed system classification accuracy rises to 99.01% with the deep-tuned Xception model, which is considered superior to standard classifiers. The proposed MFR  (Peng et al. 2020) 93.29% at 10 dB 50,278,762 590.18 Approach of (Ponnaluru and Penke, 2020) 95.5% at 10 dB --Approach of (Sun and Ball, 2022) 91.5% at 10 dB --system depends on the model built from scratch with HT. It achieves almost the same accuracy level as that of the Xception model, with the lowest training time and the least trainable parameters.

Conclusion
The proposed MFR system can be implemented with one of different CNN models with HT, namely EfficientNet b0, EfficientNet b7, ResNet-50, ResNet101_32 × 8d, DenseNet121, MobileNet V2, Xception, and a model built from scratch. The proposed system key contribution is that it achieves high modulation format recognition performance without requiring data augmentation or sophisticated feature engineering. A thorough comparison of the proposed work with current well-known ML-and DL-based modulation classification techniques was also given in the paper. The comparative findings showed that the proposed system outperforms all tested classification techniques.