Android applications classification with deep neural networks

Currently, Android is the most widely used mobile operating system globally. This platform has become a target for malware activities due to its technological and user appeal, open-source code, and the possibility of installing apps from third-party vendors without much restrictions or centralized control. Although it has security features, recent reports of malicious activities and Android’s vulnerabilities have heightened the need for robust frameworks and approaches to improve its security. Recent studies have proposed many security methods, using static analysis, dynamic analysis, and artificial intelligence techniques to prevent malware attacks. Current sophistication of Android malware infections has made the detection of malicious apps a significant challenge. In this study, deep-learning techniques for categorizing Android applications are examined. Initially, we suggested a deep belief neural network-based applications categorization approach. With clearly defined training and testing splits from the CIC-AAGM2017 Android datasets, we further trained and assessed our neural network’s classification performance against four conventional deep feed-forward neural networks and seven baseline models based on machine-learning algorithms. The experimental results showed that the proposed neural network could classify Android apps into benign and malicious categories with 98.7% accuracy. The classification accuracy of the DBN-based model is 1.86% higher than that of other deep learning-based models studied by recent research contributions.

the functionalities of mobile devices such as smartphones. There are currently more than 6 billion smart devices, and 72% of them run on the Android operating system [1]. Thus, Android is the most popular mobile operating system on the market currently. Android recorded over 2.5 billion active devices monthly basis alone. Hence, Android has become quite popular as an operating system for smartphones, automobiles, and internet of things (IoT) devices [2].
Android has made millions of apps easily accessible and downloadable to promote further growth and keep existing users. However, these apps frequently fall short of meeting the most basic security standards. As a result, they can indirectly attack other managed or linked devices (such as Internet of things nodes) that carry out delicate tasks such as controlling automobiles, opening smart locks or performing health checks.
Attackers and malicious users would naturally choose a system with a large client base as their target, hence malware authors are making malware that targets android applications daily. Recent studies have revealed that 50currently accessible apps are susceptible to one or more security vulnerabilities [3][4][5][6]. In the future, there will be an increase in the sophistication, quantity, and variety of malware that target Android applications [7]. Therefore, there is an urgent need for methods and frameworks to improve security in the Android platform.
It is important to note that, Android has a security system that relies on a sandbox and permissions system, where each Dalvik virtual machine in which a program runs has a distinct user ID issued to it. Application code executes independently of every other application code, this means that one application cannot allow access to the files of another application.
In addition, Android operates under the Least Privilege Principle, where access to additional sensitive resources is restrained by the granting permissions, where each installed application is allowed the minimal capabilities necessary to ensure its operations.
The open structure of Google Play and side-loaded apps make application vulnerabilities quite widespread despite the existence of these security solutions in the Android platform. Google Play does not verify uploaded applications manually [8]. Even though Google discourages users from side-loading applications and third-party applications due to security concerns, some installations from other third-party vendors are still permitted [9,10].
In addition, there are vulnerabilities associated with the Android platform itself. Remote access tools (RAT), which transmit messages, enable cameras for spying, and track GPS data, can access call history, online browser history, installed applications, and more on mobile targets.
The recent development of sophisticated evasive methods by malware developers to undermine existing security measures has discovered vulnerabilities in the Android platform a daunting challenge. For instance, social engineering tactics are frequently exploited by attackers to lure unsuspecting users into falling victims, where sensitive data are stolen. The techniques include drive-by downloads, SMS attachments, games, phishing, spam campaigns, and fake advertisements. Therefore, effective malicious app recognition techniques are necessary to defend our smart devices against attackers.
Current methods for analyzing the security vulnerability in the Android platform continue to have several shortcomings due to their reliance on predefined patterns. The diversity of Android malware families has rendered machine learning-based approaches, which have been more successful in addressing security issues in other domains ineffective in the Android platform [11]. Conventional machine-learning models fail to recognize malicious applications due to obfuscation methods employed in Android malware, as they follow shallow learning methods [12]. Modern Android malware attacks are so advanced that they can identify the presence of the emulators used by machine learning and other AI-based models for analysis and alter their behavior to avoid detection. Deep-learning models work well with large, complicated datasets that can extract unique, self-generated features without using manual feature extraction [13]. Consequently, researchers in this field concentrate on creating Android application classifiers based on deep learning.
The classification method presented in this study employs an unsupervised layer-wise training procedure to train a deep feed-forward neural network. With the help of intermediate compressed hidden layers, our model reconstructs the supplied Android samples while learning the necessary information from the input data. To discriminate between malicious instances and legitimate samples, we used a stochastic anomaly threshold technique based on the reconstruction loss. The selection of this threshold is based on the notion that using benign samples will result in high reconstruction error while using malicious samples would result in low reconstruction error. We have repeatedly run our model with several thresholds rather than using a single threshold value to find an optimal threshold.
With this chosen threshold, our model can accurately classify known and unfamiliar android applications. We evaluate the proposed model using the CIC-AAGM2017 datasets provided by the University of New Brunswick (UNB), which has also been used in recent research studies. The dataset is composed of original apps from different sources that have undergone several transformations such as different alignments, replacement of strings and icons, insertion of junk codes, and insertion of junk files, to produce samples with varying relationships between the apps. We obtained the highest accuracy of 98.7% on this dataset. We assess the results of the proposed method against recently studied variants of neural networks, including deep neural networks, multilayer perceptron-based models, backpropagation-based models, and rule-based neural networks. We further trained and compared baseline machine learningbased algorithms, including Logistic Regression, Linear SVM, XGBoost, Gradient Boosting, and Decision Trees classifiers. The results demonstrate that the suggested model had improvement over the other models. The results show that the proposed approach outperforms competing approaches on the CIC-AAGM2017 dataset, where our accuracy was 98.7%, compared to an average of 92.8% for all the neural networks and 87.8% for all the machine-learning models considered.
The following is a summary of the contributions made by this paper: • We identified the challenge in categorizing unfamiliar Android apps into malicious and benign categories and presented a deep learning method based on a deep belief network, seen as the origin of recent advances in training deep neural networks in research papers. • We used an unsupervised layer-by-layer pre-training procedure to provide an efficient method for recognizing unknown applications from voluminous datasets. The model used only a legitimate sample for training the model. The model reconstructs the input with a lesser reconstruction error for the benign application's data at the output layer of the neural network. • We run several experiments using real-world Android datasets. The findings indicate that our model performs better than the baseline methods in tasks requiring the classification of malicious apps and legitimate apps.
The rest of our study is structured as follows: Sect. 2 discusses the prior research on android-based application categorization using machine learning and deep neural networks. In Sect. 3, we provide an overview of DBN and outline how we built our model using the Tensorflow library. In addition, dataset creation and model parameter selection is discussed. In Sect. 4, we implement our DBN Android application model by fitting it into the datasets. The test datasets were used to evaluate the classification accuracy of our model, and the outcome is compared to that of the plain neural network-based model under the same conditions. Section 5 presents the summary of the paper.

Related work
The open handset alliance (OHA), an organization working to improve the Android environment, has offered many security solutions for Android platforms, including operating system improvements and software updates. Again, it has proposed that users download files and apps from reputable sites and sources. However, these measures can only be observed by IT-savvy users. Naive users, who unfortunately are in the majority, will always fall victim by not strictly following these guidelines. In recent years, attempts to detect new Android malware attacks have gained the most traction thanks to the availability of techniques based on machine learning and artificial intelligence [14].
Akbar et al. [15] recently presented a permissions-related malware detection method that assesses an application's malicious activity based on suspicious permissions. The authors employed a multilevel technique by extracting important features like permissions from over 10,000 android apps. The researchers used several machine-learning methods, including Support Vector Machines, Random Forest, and Naive Bayes classifiers, to categorize the applications into their malicious or legitimate categories. Zhang et al. [16] subsequently proposed android malware detection by analyzing system call traces for legitimate and malicious android-based apps. The authors trained six machine-learning algorithms for the malware detection system.
Shatnawi et al. [17], Syrris et al. [18], Herron et al. [19], Islam et al. [20] and Raymond et al. [21] recently trained machine learning-based models using static features of applications samples to determine the presence of malicious apps in android systems. Though machine learning-based methods have proven successful in detecting android malware activities, as reported by the above and many other studies. However, the majority of current android datasets are unlabeled, which limits the ability of machine-learning methods to accurately detect malware applications. Deep learning has emerged as a promising field in cybersecurity as analytic models built on data allow for the discovery of insights that can aid in the prediction of malicious activities. Lakshmanarao et al. [22] recently targeted specific opcode sequences extracted from the android apps dataset to train a recurrent neural network for malware classification. Fallah et al. [23] recently modeled instances of mobile traffic data as a series of flows using Long Short-Term memory (LSTM) neural to detect Android malware applications. Even though the most recent research (2022) seems to offer strategies for fending off new threats to the Android platform, sophisticated and complex Android malware can recognize the existence of the emulator used by these models and adjusts its behavior to avoid detection. In addition, deep neural network techniques such as LSTMs and RNNs, require larger quantities of training data to learn the complex and nonlinear functions necessary to make accurate predictions. Therefore, these networks frequently underperform in predictions when the data are not large enough, rendering their models unsuitable for generalization in large and smaller sample cases.
In this study, we propose network layer features as the basis to build a deep neural network model that can effectively recognize malicious apps from Android datasets. Specifically, we train a deep belief neural network (DBN) model on a dataset captured from installed Andriod applications on actual devices. DBNs can be trained with lesser amounts of labeled data by stacking numerous Restricted Boltzmann Machines (RBMs) [24]. Since feature extraction is carried out unsupervised by multiple stacks of RBMs, a small collection is adequate to train a network suitable to classify android datasets into benign and malicious apps.
The deep belief network is widely reported by researchers as the brain behind recent advancements in deep neural network training, providing a secure learning process to effectively identify and classify apps. The proposed network architecture created a model as a binary classifier in which each Android sample will classify into either malware or benign categories.

Methodology
Studies in android security using artificial intelligence and machine-learning techniques widely involve the creation, analysis, and benchmarking methods for understanding secu- rity and the issues plaguing the android platform. The characteristics of our framework and the neural network architecture of our proposed applications classifier are presented in this section. We investigate several training techniques for deep belief neural networks in this study. We first define the Android application categorization problem, give a brief introduction to deep belief networks, and demonstrate how it models our data in this section. We then present a general overview of the pre-training process. Finally, we applied the model to our dataset using carefully defined training and testing sets.
Definition of problem: Given a set of Android Application Packages (APKs) expressed as an input vector V L = {v 1 , v 2 , v 3 · · · v n }, where each v i is represented by a vector containing the values of n dimensional features. Let output_label ∈ { normal APK, malicious APK } is the class label associated with the application. V would be utilized to train a deep neural network classifier to learn the characteristics of both normal APK and malicious APK. The goal of a trained deep neural network is to categorize a given unlabeled f eature n }, that has never been seen before by assigning a label, Output_Label where output_label ∈ { normal APK, malicious APK }.

Deep belief neural networks
Deep belief networks (DBN) is a deep neural network that forms the origin of unsupervised layer-wise pre-training procedure for deep feed-forward neural networks. We build an entirely unsupervised generative model [25] that mixes undirected and directed interactions between the variables that constitute either the visible layer or the hidden layers. Figure 1 is a representation of the general structure of the DBN model with three hidden layers. As can be seen, we have unidirectional interactions at the top layers and directed connections at the lower layers. In a DBN, the top two layers will always form an RBM. Thus, the distribution, p H (2) L and layer H (3) L , is an RBM with undirected interactions. The other layers are going to form a Bayesian network with directed interactions. Specifically, the conditional distribution of the units given the layers above them takes the form: That corresponds to a probabilistic model associated with the Logistic model [26]. That is: for the probability of a visible unit in a visible layer to be equal to 1 it is going to be the sigmoid (σ ) applied on a linear transformation of the layer above it. For H (1) L is going to be a linear transformation of H (2) L , and for the visible layer, V L is going to be a linear transformation of H (1) L as indicated in (1) and (2)

Our model
We introduce the framework for the proposed deep neural network for Android Applications classification in the section. We devised the Android classification problem by assigning labels depicting benign and malicious apps based on reconstructed features on the dataset. The adoption of deep neural networks in android classification is a clever way of representing input features accurately with the automatic extraction of complex features from enormous volumes of data using many processing layers. That provides good computing efficiency with little complexity.
The proposed method uses a generative framework that combines interactions between the undirected variables and directed variables that constitute the inputs or hidden layer. Figure 2 presents the methodology of our study. Figure 3 presents the architecture for our proposed neural network. The input layer of the DBN network takes the input from the training split containing the relevant features from the dataset. The other layers form a Bayesian network instead, with directed interactions. To generate the data from our DBN, we applied Gibbs sampling procedure [28], between the top two layers over several iterations through algorithm 1.
Here, B j is the bias associated with hidden units H j , W i j , and V i are the connections and visual layer products directly L , we will generate directly, the input V L , which gives us the sample or observation of input layer V L from the DBN model. More specifically, the joint distribution of the input layer V L and our three hidden layers is going to be a prior We recognize here the probability distribution of Restricted Boltzmann machines, where we have our weight W .
Between H (2) L and H (3) L and the biases b (2) and b (3) for the two layers, and a normalization constant Z . So we have the prior p H In other words, given H (2) L , each of the hidden units in the first hidden layer is independent, and similarly for the visible layer X given H (1) L , thus, The procedure of stacking RBMs for Pre-training the neural network: We train a three-layer DBN model Fig. 3 in this study by first going from a one hidden layer DBN (a), (which in essence is an RBM). We train that for a while and use its parameters to initialize a two-hidden layer DBN (b). Then we would be left with the top part of the DBN (i. e. the RBM part of the two-layer DBN) by keeping the weights fixed to initialize the parameters of the top layer.
To move to the three-layer DBN (c), we again use the weights of the two-layer (b) to initialize the lower part of (c) i. e. we use the weights of the top layer of (b) to initialize the upper part of (c) to get a good initialization for the top layer of (c).
Finally, we use the up-down approach to perform a finetuning procedure. The two main components of DBN neural networks are as follows. The initial step is to pre-train our network using numerous layers of RBMs. Second, to further hone the RBM stack findings, we deployed a feed-forward back-propagation network.
RBMs use an unsupervised learning method to learn static analysis of features in our data by reconstructing the input into permissions between a set of malware and benign samples. Pre-training is carried using the stack of RBMs such that the RBMs, the model learns information about permissions of applications, which represent features of the visible layer, and maps each Android Application Package (APK) against a list of permissions in the hidden layer. We employ greedy layer-by-layer algorithm [30] iterative Gibbs Sampling [31], detailed earlier, for pre-training. The sigmoid belief network is then trained through fine-tuning, where we employed the back-propagation technique to obtain the best weights; since we need to have labels for the supervised learning problem.

Experimental analysis
We first introduce the datasets used for our investigations in this section. Next, we explain the metrics used to assess the performance of the obtained model from our neural network. We follow this up by outlining the details of the experimental setup. Finally, we discuss the findings of the experiments. Our suggested model compares these results with those of existing models as well as existing studies.

Datasets
The CIC-AAGM2017 datasets [32], developed by the Canadian Institute for Cybersecurity (CIC) and available at the University of New Brunswick (UNB), were used to train the network and validate our model. The datasets were obtained through the installation of Android samples on real devices through a semi-automated process. According to Lashkari et al. [32], the ratio of the malicious and benign applications across the World stands at 20%: 80% distribution. Therefore, the CIC-AAGM datasets used for the experiments in this study were 400 malware apps against 1500 legitimate apps was deemed suitable distribution. The datasets have been used in many recent studies, including Bovenzi et al. [33], Ullah et al. [34], Tang et al. [35], and Krupski et al. [36]. Since our model employs a supervised learning approach, we needed both benign and malicious applications to fit the model. As a result, we collected all the 1500 benign apps and 400 malicious apps, giving us a total of 1900 samples. The samples were collected within 10 years, spanning from 2008 to 2018. Table 1 presents the yearly breakdown of the datasets in terms of malicious and benign samples that were collected.

Dataset pre-processing
In this study, we focus on a binary classification problem for Android Applications classification, where each observation We used the weights of the top layer of (b) to initialize the upper part of (c) to get good initialization to pre-train our entire network. This RBM automatically extracts meaningful features from the input vector (datasets)  is classified as belonging to malware or benign class. Before training our classifiers, we ran the following pre-processing operations on our datasets.
• The CIC-AAGM2017 datasets were captured from network traffic of 1900 Android applications that had been installed on real devices and drawn from three families, including adware, general malicious applications and legitimate apps. • The categorical properties were converted to numerical features using the one-hot encoding [37] process. • Binary encoding was also used to turn the non-numerical class labels into numeric categories. Since this model employs binary classification to distinguish between malicious and legitimate samples, those instances were given the numbers 1 and 0, respectively. • The numeric characteristics have been standardized to lessen the effects of the original feature value scales. We used the Min-Max Normalization for each feature, which rescales the feature range to scale the range in [0, 1]. The minimum-maximum normalization formula is presented in Eq. 6: where W j is the jth set of normalized data and S j is a Ddimensional feature vector taken from the training dataset.

Evaluation metrics
The four widely used evaluation metrics in machine learning: Accuracy, Recall, Precision, and F1 Score of the top categorized apps, are used in this study. Accuracy essentially assesses the consistency of our classifications in comparison to the overall predictions observed. F1 Score is the weighted average of Precision and Recall, which gives us a decent overall view of how well our model performed. Recall indicates how effectively our model could identify the right apps, Precision is how well the model can distinguish between benign and harmful apps. The terms: (TP) true positive, (TN) true negative, (FP) false positive, and (FN) false negative were used to describe these measures and defined as: TP: where an app sample is correctly classified, it is recognized as TP.
FP: where a benign app sample is classified as malicious, it is recognized as FP.
TN: where a benign app sample is classified as benign it is recognized as TN.
FN: where a malicious sample is classified as benign, it is recognized as FN.

Experimental setup
The simulation experiments were carried out on a physical machine running macOS Monterey version 12.6 on a 1.4 GHz quad-core Intel Core i5 processor with 8 GB of 2133 MHz LPDDR3 memory. An NVIDIA GTX 1060 GPU with 6 GB memory sourced from Google Collaboratory is used as an accelerator. Our proposed DBN model and the baseline models trained for comparison were implemented on the TensorFlow v2.11.0 and Keras version with python 3.
Having pre-processed the dataset the benign and malware samples were divided into separate groups. Each dataset was further divided into training, validation, and testing subsets. Only benign samples were included in the training split. The training split was used to fit the model, and the validation set is utilized to fine-tune its hyper-parameters. Finally, the test split was used to assess the model's effectiveness. We summarize internal hyper-parameters selected in Table 2.
We built our DBN network in TensorFlow in two stages: first, we built a Python class for RBM to create and use the RBM, then we built our DBN. once we create the RBM and load our datasets, we then create the DBN network. Three RBMs were employed in this study; the first had 4000 hidden units, the second had 2000 hidden units, and the third had 50 hidden units.
As a result, we generated a deep hierarchy of representation of the training dataset. Each RBM was trained separately by calling the train function, which returns the current RBM output and uses it as the input for the next RBM. The learned representation of the input data is then transformed into a supervised prediction, which is a binary classifier. Finally, we use the output of the last hidden layer of the DBN network to classify applications. We trained our network using a dataset that includes both legitimate and malicious applications because our study used the supervised learning technique as stated earlier. 80% of the entire dataset was used for training and the remaining 20% for testing. Then, we flip it tenfold (ten ways) so that part of all the dataset would be used as test set that mimics an unseen or new application. That would help us measure how effective our model would be performing.

Experimental results
We evaluated the achieved results of the proposed method in three stages. First, a comparison with machine learningbased models was carried out. The same dataset used to train these models using different optimizers such as Adam [38], limited memory BFGS optimization algorithm (L-BFGS) [39], and Stochastic gradient descent (SGD) [40]. Second, we trained deep-learning algorithms related to neural networks on the CIC-AAGM2017 datasets. Finally, we compared the proposed method with recent studies that have used related android datasets to perform classifications. Table 3 presents the comparison of our proposed method with other machine-learning methods trained on our datasets. To evaluate the success of any classifier, the amount of samples and class labels in the dataset is crucial in achieving good recognition accuracy. The baseline machine-learning models presented in Table 3 were novel algorithms developed and applied in many classification tasks. Table 3 presents the performance of baseline machine-learning classifiers that fit our dataset. Random Forest and Decision Tree classifiers outperformed all the other algorithms considered in the study. These two learners are the most similar to the DBNbased model, with classification accuracy and precision of 94.3% and 92.3%. Logistic Regression classifier produced an accuracy of 63.2%, which was the worst performer on the CIC-AAGM2017 datasets. However, it produced the highest F1 Score of 48.7% of all the machine-learning models. This is because our dataset had imbalanced class labels. According to Yerima et al. [41], when there is imbalanced data, the F1 Score is a more accurate measure of performance. We may, therefore, state with confidence that logistic regression also accurately fits our dataset. These models achieved an average of 80.10% accuracy when trained on the dataset. Although the number of samples and class labels is the same as the proposed method, our model obtained 1.86% higher accuracy.
In Table 4, we see a comparison with the state-of-the-art methods based on deep neural networks. The average classification accuracy of these models stood at 90.93%, which is far less than our proposed model, even though the same dataset was employed. The proposed DBN-based model achieved approximately 0.78% higher classification accuracy. Table  5 shows the comparison of our method with state-of-the-art models reported in the recent literature. The models were trained on different android datasets for classification tasks. The trend in the class labels in the datasets shows an imbalanced class with the exception of the MPL model reported by Turkeret et al. [47].
We trained four back propagation neural networks [42], using our data in a similar experimental scenario, and present the results in Table 4. With accuracy, precision, recall, and F1 Score all-surpassing 98%, the findings were outstanding overall. We have a model with these promising parameters. Table 5 compares our model with other existing techniques that also trained models on cutting-edge Android datasets similar to our own using deep-learning techniques. The preliminary processing of the datasets in these studies is the same as in this paper. The different entropy features of APK data are extracted and those that can best represent Android applications are selected and classified using deep-learning algorithms.
In our work, we have also used the confusion matrix, a table layout that allows the visualization of an algorithm's performance. Figure 5a presents our method's confusion matrix for the CIC-AAGM2017 datasets. The figure shows that our approach accurately classified 98% true positives. That means the model was classified as 1 and the true label was malware app. True negatives are 28%, meaning the model 0 and the true label was benign app. Therefore, the model successfully classified 98% malicious apps and 28% benign apps from the CIC-AAGM2017 datasets.
To further demonstrate the effectiveness of our approach, we have employed the receiver-operating characteristic (ROC) curve. The classification performance of the proposed model at various threshold values is measured using the AUC-ROC curve. AUC represents the level or measurement of separability, and ROC is a probability curve. It reveals how well the model can differ across classes. The higher the AUC, the better the model is at classifying benign apps as benign and malicious apps as malware. ROC curves are frequently used in binary classification to examine a classifier's output. Figure 5b shows that our proposed model gives an AUC score of 0.79718179 for the CIC-AAGM2017 datasets. It means that our model can classify 79.72% of benign and malicious applications successfully. Our results are highlighted in bold

Conclusions and future work
Traditional AI-based classification models fail to classify sophisticated Android malware with unknown patterns; hence, the proper classification of such apps has become one of the challenges in Android security. In this study, we presented a new application classification model based on deep learning. The proposed deep neural network can effectively model real Android applications data and classify samples into benign and malicious categories. The greedy layer-wise pre-training procedure was adopted as a way of initializing better parameters of the DBN network, which was used to train deep feed-forward neural networks. The network recognizes malicious apps from the dataset using a stochastic threshold model based on reconstructed error. Our experimental analysis shows that the model provides an intuitive yet promising approach to classifying Android applications and is suitable for other datasets. The practicality of our approach is affirmed by the comparative studies presented in Tables 3, 4, and 5. In addition, it demonstrates that deep belief neural networks have good classification performance and offer promising research opportunities in Android security analysis. We plan to extend our model from this binary classification problem, to classify specific families of Android malware in our future study.

Author Contributions
The study conception, design, and methodology were performed by MAM. MA supervised and coordinated the processes leading to the completion of the study. SA reviewed and edited the manuscript. Data curation, validation, visualization and analysis were performed by BOE.
Funding The study was funded by the authors.

Declarations
Conflict of interest There are no financial or personal conflicts of interest in this work.

Ethical approval
No data or other material from studies involving human or animal participants are included in this study.