Machine learning approach for detecting and combating bring your own device (BYOD) security threats and attacks: a systematic mapping review

Bring your own device (BYOD) paradigm that permits employees to come with their own mobile devices to join the organizational network is rapidly changing the organizational operation method by enhancing flexibility, productivity, and efficiency. Despite these benefits, security issues remain a concern in organizational settings. A considerable number of studies have been conducted and published in this domain without a detailed review of the security solution mechanisms. Moreover, some reviews conducted focused more on conventional approaches such as mobile content management, and application content management. Hence, the implementation of security in BYOD using the conventional method is ineffective. Thus, machine learning approaches seem to be the promising approach, which provides a solution to the security problem in the BYOD environment. This study presents a comprehensive systematic mapping review that focused on the application of the machine learning approach for the mitigation of security threats and attacks in the BYOD environment by highlighting the current trends in the existing studies. Five academic databases were searched and a total of 753 of the primary studies published between 2012 and 2021 were initially retrieved. These studies were screened based on their title, abstract and full text to check their eligibility and relevance for the study. However, forty primary studies were included and analyzed in the systematic mapping review (SMR). Based on the analysis and bubble plot mapping, significant research trends were identified on security threats and attacks, machine learning approaches, datasets usage, and evaluation metrics. The SMR result demonstrates the rise in the number of investigations regarding malware and unauthorized access to existing security threats and attacks. The SMR study indicates that supervised learning approaches such as SVM, DT, and RF are the most employed learning model by the previous research. Thus, there is an open research issue in the application of unsupervised learning approaches such as clustering and deep learning approaches. Therefore, the SMR has set the pace for creating new ground research in the machine learning implementation in the BYOD environment, which will offer invaluable insight into the study field, and researchers can employ it to find a research gap in the research domain.


Introduction
The unending advancement in mobile computing has shifted interest in the innovative ubiquitous paradigms. Among these paradigms is bring your own device (BYOD) (Ballagas et al. 2004), which is separating attention and investments (Costa et al. 2018). BYOD paradigm permits employees to come with their own mobile devices, including tablets, laptops, and smartphones, to work and join them in an organization to access its resources instead of utilizing the organization's own devices (Samarathunge et al. 2018). Various organizations and corporations such as Citrix system, Unisys, Intel, Apple, and the White House are competing in the adoption of BYOD, certainly, the BYOD paradigm will rapidly receive attention in adoption by the leading organization worldwide (French et al. 2014). A survey conducted by CISCO reveals that 95% of companies permit the utilization of their owned devices to some degree, 36% provide complete support for their own devices, whereas 48% support selected devices (Barbier et al. 2012). Since the organization aims to attain high productivity and satisfy employees, they decided to permit employees to bring their own mobile devices into the process due to its benefit to the organization.
The benefits of BYOD cannot be overemphasized. One, there is a cost reduction in the organization that adopts the BYOD policy. This is because employers save a lot of money in acquiring high-cost devices and resources, including service agreements, hardware, licensing, software, insurance, and purchasing of data plans (Caldwell et al. 2012;Wang et al. 2014). Two, there is an increase in flexibility, productivity, and mobility on the side of the employees in the sense that they perform more jobs with their devices anywhere and at any time since they are very familiar with and satisfied as they use their devices (Rivera et al. 2013). Three, the durability of the device(s). This is possible because the employees usually handle their own devices more carefully than their provided counterparts (Ghosh et al. 2013). Four, online education. BYOD learning experience helps educational organizations such as Harvard edX, Khan Academy, and MIT edX to offer excellent quality online tutoring at a minimal cost (Miller et al. 2012). Thus, the organization's purpose of adopting the BYOD policy is to facilitate convenience, flexibility, and device portability to take care of their employees' workflow, enhancing their efficiency and confidence (French et al. 2014). Besides, it also improves communication within an organization, online transactions, employees interaction, and remote access to corporate data outside the organization is made easier.
Even though BYOD offers numerous benefits to both employees and organizations, security issues remain the major challenge in the BYOD environment. BYOD security is a set of technologies used to lower the risks associated with the practice of using a personal device for work-related reasons rather than a company-issued one (BYOD). BYOD allows for the use of any desktop or mobile device, including laptops and smartphones. These BYOD endpoints could link to corporate hardware, software, and networks, posing serious security threats (Wani et al. 2020). For instance, when data is copied from or into a mobile device, the data will be kept on that device even after the device has been disconnected from the corporate network. In such cases, the disclosure of confidential data to unauthorized users will be much easier. Also, users exhibit confidence that their data is secured in all circumstances, but when there is a malfunction within the corporate network, it will alter and expose the confidential data saved in the mobile device. Thus, there may be a need for an employee's device that is permitted at the workplace to be configured in both software and hardware at the maximum level due to the organization's security concerns (Samarathunge et al. 2018).
In order to secure BYOD in an organization network, the machine learning technique aspect of artificial intelligence are promising alternative (Kamal et al. 2022). Machine learning is one of the advanced artificial intelligence techniques that can perform well in a dynamic network without being explicitly programmed. Machine learning technique's contribution to the security of BYOD cannot be over-emphasized (Ganiyu and Jimoh 2021). Machine learning can be employed to train a model to detect different attacks and offer corresponding preventive policies. In this context, the attack can be identified at the initial stage. Furthermore, machine learning approaches seem to be promising results in identifying new attacks by employing learning ability and handling them intelligently. Thus, the machine learning model can offer a potential security protocol for BYOD, which offer more reliability and accessibility than other models. On the other hand, Mobile computing has generally promoted development in recent years with the support of an increase in mobile terminals (such as mobile phones, Pocket PCs, PDAs, and mobile computers) and mobile networks (wireless networks, Bluetooth, GSM, 3G+, etc.). Mobile computing can benefit from machine learning techniques like Naive Bayesian, C4.5, Decision trees, etc. Mobile text categorization, malware detection on mobile devices, sensor-based activity recognition, and language understanding, are crucial machine-learning applications for mobile devices (Chaudhar and Kolhe 2013).
Currently, few reviews and survey research have been conducted on BYOD security implementation (Akin-Adetoro and Kabanda 2015; Eke and Anir 2021;Oktavia et al. 2016;Wang et al. 2014). For instance, Garba et al. (2015) conducted a review study on BYOD with a focus on information security and privacy challenges. The study looked into the current organizational practice that reveals information on BYOD and the drawbacks of adopting it. The study's findings indicate that the failure to achieve security and privacy on data will lead to the futility of BYOD adoption. They further added that if users' experience is not established, the solution will become unsuccessful. Similarly, Oktavia et al. (2016) conducted a systematic study of the security and privacy challenges in the BYOD environment. The study critically investigated the components based on security and privacy issues on BYOD. The study's findings maintained that the organization is required to amend its security policy and embrace the enhanced ones based on the identified threats. The authors added that the best solution should be able to split personal data and cooperate space, which will result in the protection of corporate data (Wang et al. 2014). In addition, (Jamal et al. 2020) presented a systematic study on the authentication technique employed for BYOD security implementation. The study identified the existing BYOD authentication methods and classified them based on BYOD threats, and further analyzed them by identifying their limitations. In another study, Akin-Adetoro and Kabanda (2015) conducted a review on BYOD with a focus on small and medium enterprises (SMEs). The study highlighted the contextual issues that SMEs in developing countries are required to know before the adoption of BYOD. The findings from the review indicated that the organization is required to plan for a change as a result of the ubiquitous and strategic nature of Potential IT changes that may be introduced by the employees. The authors also considered that due to the employees' appealing nature to utilize their devices in the workplace, SMEs in developing countries must equip themselves to harvest the possible benefits of BYOD adoption. In a related study, Palanisamy et al. (2020) conducted a systematic review study on compliance with BYOD security policies in organizations. The author utilized a total of 21 articles published between 2012 and 2019. The findings from the review provides an overview of the theory employed to describe and analyze security behavior and the factors that affect the BYOD security policies compliance behavior. Thus, the review focused on the features and factors influencing compliance behavior in BYOD environment.
Despite the reviews and surveys that have been conducted and reported in the literature, several limitations have been identified and summarized below.
1. The aforementioned studies followed the formal literature review approach and did not include any research questions, search strategies, data extraction processes, and data analysis. Hence, there is a need for a more systematic approach to reviewing the existing knowledge in BYOD security implementation. 2. The existing review provided a systematic study only on the security threats, authentication, mitigation mechanism, and privacy of BYOD security implementation. 3. A few of the studies concentrated on the conventional BYOD security models for mitigation strategy and none of the reviews provided a comprehensive review of BYOD security mechanisms currently in place, such as the machine learning methods for secure BYOD solutions.
However, the drawbacks mentioned above, identified in the current reviews have motivated the authors and necessitate the proposed study, which aimed to conduct a systematic mapping review to analyze the existing research literature that focuses on the application of the machine learning approach as a mitigation mechanisms implementation in BYOD security threats and attacks. The purpose of conducting this systematic mapping review (SMR) is to provide an adaptable and reliable evaluation of a BYOD implementation based on the artificial intelligent research domain (Juárez and Cedillo 2017). In this investigation, the SMR study was conducted by answering four formulated research questions. A total of 753 articles published from 2012 to 2021 were initially retrieved from 4 major academic databases. However, by following the screening process with the consideration of inclusion and exclusion criteria, 40 primary studies were selected.
The major contributions of this study are outlined below: 1. Detailed background study of the existing security threats and attacks in the BYOD environment 2. A comprehensive review of the machine learning techniques for addressing the security threats and attacks in BYOD environments in the aspects of the dataset usage, the machine learning approaches, and the performance metrics. 3. Analysis of the security threats and attacks, machine learning approaches, datasets, and evaluation metrics based on the selected primary studies for the systematic mapping review 4. Illustration of the percentage distributions and bubble plots to demonstrate the research trends on the BYOD threats and attacks, machine learning approaches, datasets, and performance metrics that are utilized in BYOD security implementation.
The rest of the article is divided into 7 sections. Section 2 presents the review method. Section 3.1 describes the machine learning approaches for BYOD security implementation. In Sect. 4, the systematic mapping result is provided. Section 5.3 presents the discussions of findings from the systematic mapping review. In Sect. 6.3, the threats to validity are presented whereas Sect. 7 gives the concluding remarks and future direction.

Review method
Systematic mapping is a process of exploring the existing studies to obtain overview results and the type of research that has been conducted in a particular research area. Thus, it identifies the type of research, the quality of research, and the available output. It displays the publication trends by mapping publication frequencies with time. Besides, it provides a summary of the research domain. According to Petersen et al. (2008), "systematic mapping provides a structure of the types of reports and results that have been published by classifying them, and it often provides a visual summary through the mapping of its results." Conversely, a systematic literature review investigates relevant existing literature in a particular research field and carries out an in-depth review, evaluation, interpretation, and description of the methodology and results (Keele 2007). Over the years, several researchers have embraced and followed the systematic review guideline provided by Petersen for conducting the review. Though a systematic mapping study is most applicable in the software engineering domain, it is not limited to the software engineering field. It has been acknowledged in other research fields by employing the same guideline provided by Peterson. This guideline has provided a lot of benefits in describing a study domain or sub-domain (Abdelmaboud et al. 2015;Cavalcante et al. 2016;Fernandez et al. 2015).
In addition, the adoption of Peterson guidelines with a Kitchenham systematic literature review has become common in conducting a systematic mapping review. Adopting this new approach will further enrich the previous approach by making systematic mapping studies more comprehensive and obtaining a profound conclusion. Therefore, this study combines the Peterson systematic mapping review guideline (Petersen et al. 2008) and Kitchenham systematic literature review guideline (Kitchenham and Brereton 2013). The systematic mapping process is usually performed in five different major phases, where the output of each phase provides the input for the next phase. The illustration of the phases is depicted in Fig. 1 as demonstrated by Petersen et al. (2008) Phase 1: Definition of research questions and the corresponding research objectives. Phase 2: Definition of the search strategy and relevant studies selection process. Phase 3: Performing the screening and selection criteria (inclusion and exclusion). Phase 4: Performing the classification scheme, which is the core structure of the systematic mapping. Phase 5: Data extraction and mapping process based on the classification scheme to show the research trends ( Fig. 2.  Kitchenham and Brereton (2013) maintained that research questions are expected to determine the problems being addressed and the aim of the research method. This study aims to explore the existing primary studies focusing on the security threats and attacks on BYOD, and the application of the machine learning approach as a mitigation mechanism for the threats and attacks on the BYOD environment to identify the research trends, open issues, and further research in the domain. With regards to the necessity of conducting a systematic mapping study and to achieve the aim of this research, the following research questions and the corresponding research objectives have been defined as depicted in Table 1 2

.2 Search strategy and relevant studies selection process
Our search for suitable articles involves three sequence activities, which include keyword identification, search strategy formulation, and data source selection. The keywords were identified and the necessary search strategy was created based on the research question's content. However, the keywords were enhanced after the initial search. Some keywords were merged and searching was conducted in various iterations. Published articles covering 2012-2021 were included in the current study. The time frame was chosen because it was the period that the enterprise permitted consumer devices on the enterprise network (Micro 2012).
To obtain suitable articles, we considered relevant articles published in five academic databases, which include ACM Digital Library, IEEE Xplore, Springer, and Science Direct. The choice of database selection is based on Petersen's suggestion (Petersen et al. 2008). Nonetheless, the combination of the overarching nature of the above databases also provides access to BYOD literature from the related disciplines. In addition, the Google Scholar database was also employed to complement the databases mentioned above to facilitate the broad search of the article that may have been skipped while using the proposed databases ( Table 2).
The search strategy began with wide coverage by using keywords to query for articles on "Application of Machine Learning Approaches for Mitigating Bring Your Own Device (BYOD) Security threats and attacks". The search terms and synonyms were formulated based on the related studies using bring your own device, security threats and attacks, and machine learning techniques. All these terms and synonyms were included in the Search Query (SQ), which is shown below. Query_1 = "bring your own device" OR "b.y.o.d" OR "mobile device" OR "tablets" OR "personal device" OR "personal smartphone" OR "personal mobile" OR "notebooks" OR "personal laptop" OR "personal tablet". Query_2 = "spoofing attack" OR "intrusion attack" OR "malware" OR "dos attack", "eavesdropping" OR "man-in-middle attack" OR "advanced persistent threat" OR "phishing attack" OR "lost device" OR "viruses". Query_3 = "machine learning" OR "deep learning" OR "supervised learning" OR "unsupervised learning" OR "clustering" OR "semi-supervised learning" OR "reinforcement learning". SQ = Query_1 AND Query_2 AND Query_3. The queries using the search string were employed on the selected database to retrieve the academic literature. Objectives RQ1 What are the potential security threats and attacks that are found in the BYOD environment? To examine the potential security threats and attacks that are found in the BYOD environment RQ2 What are the different machine learning algorithms employed for detecting and combating security threats and attacks in BYOD environments? To identify various machine learning techniques employed for detecting and combating the security threats and attacks in the BYOD environment RQ3 What datasets do researchers employ in machine learning algorithms for detecting and combating security threats and attacks in the BYOD environment? To identify various datasets that have been employed by researchers in the machine learning approach for detecting and combating security threats and attacks in the BYOD environment RQ4 What are the evaluation metrics that are employed to evaluate the performance of machine learning algorithms for detecting and combating security threats and attacks in the BYOD environment? To investigate the evaluation metrics that are employed to evaluate the performance of machine learning algorithms for detecting and combating security threats in the BYOD environment Selected articles were used for the snowballing process. A search strategy known as "snowballing" leverages the currently obtained articles to find new ones. There are two ways to achieve this: either by looking at the publication's reference list (also known as "backward snowballing") or by seeking articles that cited the identified publications (a.k.a., forward snowballing). During this SLR study, both forward and backward snowballing have been done. The study selection criteria were also applied to publications that were found through a snowball search. The list of chosen articles now included the publications that passed the check. This returned a total of 753 articles. Table 3 shows the description of the search process and the final obtained number of studies.

Screening and selection criteria
This study adopted PRISMA (Preferred Reporting Items for Systematic Review and Meta-Analysis) guidelines for the outcome report of the search results to elucidate suitably, excluded, or included primary studies in the analysis. The final selected literature after the screening process is depicted in Fig. 3 based on the guideline provided by PRISMA. As Articles publication must be between 2012 and 2021 Inc3 Articles must be a research article related to BYOD security implementation using machine learning/deep learning approach such as Journal, conference paper, or published thesis Inc4 Articles must mention security threats/attacks in the BYOD environment Inc5 Articles must mention machine/deep learning performance evaluation in the BYOD environment Exclusion criteria

Exc1
Articles that did not study BYOD/personal device/mobile device/smart device security

Exc2
Articles that consist of reviews, abstracts, presentations Exc3 Articles that do not meet up with any of the inclusion criteria Exc4 Articles that are not available/accessible in electronic format stated in the previous section, five databases were chosen to search the relevant studies, which include ACM Digital Library, IEEE Xplore, Springer, Google Scholar, and Science Direct. After the keywords were searched in the aforementioned online databases, a total of 753 articles were extracted. The search output was comparatively large, but it is a normal characteristic of this form of research (Kitchenham et al. 2009). Endnote was employed as a software reference manager software to manage the articles. At the end of queries, duplicate articles were recognized and eliminated accordingly, leaving a total of 450 articles in which the further refined method of the articles selection process was introduced. This paper selection stage was performed in two different stages. In the first stage, the selection was done by checking the titles and abstracts of the articles based on the inclusion and exclusion criteria (Table 3) and eliminating the unrelated articles. At this stage, 450 studies were retained. In the second stage, the selection was made by reading the full texts of the included papers and this stage returned 65 articles. Therefore, an in-depth reading of each article and an analysis of the 65 studies were performed to verify if they indeed contribute to the aim of the study. Finally, 40 articles were used in the present systematic mapping review as depicted in Fig. 3.
The distribution of the forty (40) selected primary studies for analysis is depicted in Fig. 2. Out of the 40 selected studies, as can be seen in Fig. 2, 21 of the studies were selected from IEEE Xplore, 4 studies from ACM digital library, 6 studies from Google Scholars, 2 studies from Science Direct, and 7 studies from Springer (Fig. 4).

Classification process
The classification scheme of this study was established based on the Petersen et al. (2008) guideline, which comprises the activity of studying the abstracts of the articles, searching for keywords and ideas that reveal the contribution provided by the primary study (Petersen et al. 2008). The classification scheme aims to help in developing a categorization scheme that represents the primary population study, building a sophisticated knowledge of the nature and contribution made by each selected study, and guaranteeing that the expected results are included in the SMR (Fatima and Colomo-Palacios 2018;Petersen et al. 2008). The classification scheme is illustrated using the keywording. The classification scheme is shown in Fig. 5 whereas the classification process is shown in Fig. 5. The process involved in the classification scheme are itemized below: • Keywording: The keywording involves the activity of reading the abstract and looking for keywords that characterize the context associated with the SMR objectives (Fatima and Colomo-Palacios 2018) • Article sorting into the classification scheme: This is the activity of sorting the classification scheme after adding articles to it (Fatima and Colomo-Palacios 2018). However, the production of the classification scheme is carried out by reading the introduction and conclusion parts of each selected primary study. • Classification scheme update: This is the activity of modifying the scheme after the addition of a primary study in the classification scheme. (Fatima and Colomo-Palacios 2018). The classification process is depicted in Fig. 5. The research question of this study was addressed by extracting information from the selected articles using the following categorization.
1. Security threats and attacks 2. Machine learning approaches 3. Datasets. 4. Performance metrics 1. Threats and attacks • Threats: Threats are potential security disruptions that occur where there is an entity, action, or occasion that can break up security and cause damage. Thus, it is something that brings about vulnerability to security (Stallings 2006). • Attacks: A violation of information security that is caused by an intelligent threat. It is an effort to get illegal access to data or resources in a malicious form with the aim to harm the information systems (Stallings 2006).
2. Machine learning approaches: These are the learning algorithm employed to mitigate the security threats and attacks in BYOD environments.
3. Datasets: These are various datasets employed by researchers in machine learning security implementation in the BYOD environment for mitigating security threats and attacks.
4. Performance metrics: Performance metrics are the performance parameters employed to measure the machine learning security implementation in the BYOD environments.

Data extraction and mapping process
In the data extraction phase, we adopted the established SMR approach provided by (Petersen et al. 2008) for data collection. Moreover, the classification scheme on the machine learning approach already formulated was utilized to sort the actual data extracted from the relevant literature into the scheme. The documentation of the data extraction process was carried out using an excel spreadsheet. Next, the analysis of the publication frequencies in each classification was performed using the table (Petersen et al. 2008). To examine the trends in each category, the publication frequencies were the main focus. This is to recognize the categories that have been emphasized in the existing research and consequently, to find the gaps and possible research directions. The analysis and results presentation are performed by following the process itemized below.
a. Illustrate the statistical summary of the data using tables by indicating the publication frequencies in each classification scheme (Petersen et al. 2008). b. Report the publication frequency using a bubble plot (which is two X-Y scatter plots with the bubble in category insertions). The bubble size is proportional to the total number of articles that belong to the pair of categories, which matches the bubble coordinates (Petersen et al. 2008).

Machine learning techniques for detecting and combating security threats and attacks in bring your own device (BYOD) environment
Machine learning is a branch of artificial intelligence that models the extracted data to produce the expected future. Additionally, the computer algorithm should receive a set of instructions to understand the nature of the data. The core concept of machine learning is algorithm design that offers the machine to identify the set of data and classify it based on the attributes of the data. The learning process occurs by using data extracted by the algorithm after removing some noise (Conway and White 2012). The classification techniques help the learning algorithm to make an effective decision. Machine learning is capable of evaluating past and existing risks to obtain improved future performance (Blum and Langley 1997). There are four major types of machine learning algorithms that are usually employed in BYOD security implementation, including supervised, unsupervised, reinforcement, and deep learning, which are briefly described below. Supervised learning approaches can be utilized for detecting threats and attacks in a BYOD environment, and to create a countermeasure. Supervised learning is the most useful learning algorithm in ML where the output is classified according to the input by employing trained data for the algorithm to learn. Supervised learning is of two categories which include classification and regression learning (Tahsien et al. 2020). Classification is a type of machine learning algorithm, whereby the output is a fixed or categorical value, which could be represented as [yes or No], or [True or False]. Examples of supervised classification learning algorithms include support vector machines, decision trees, random forest, k-nearest neighbor, association rule, and Bayesian theorem. Regression learning on the other hand is a type of supervised learning whereby the learning output is a continuous value depending on the input variables. Some examples of the regression-learning algorithm include neural networks, Decision Trees, Ensemble Learning, etc.
Unsupervised learning is a type of learning algorithm employed in complex data analysis and categorization. In Unsupervised learning, there is no target data for a given input value. This type of learning does not require labeled data and can examine the unlabeled data and categorizes the data into different groups as clusters. Various unsupervised learning techniques have been employed for BYOD security for privacy protection using the infinite Gaussian mixture model (IGMM) to detect DoS attacks using multivariate correlation analysis (Tan et al. 2013). Some examples of unsupervised learning algorithms include k-means clustering and principle component analysis.
Semi-supervised learning on the other hand comprised the combination of both supervised and unsupervised learning algorithms (Shah and Shankarappa 2018). Thus, the semisupervised learning algorithm sits between supervised and unsupervised learning, having the ability to deal with the labelled datasets and unlabeled datasets for all the observations. In some practical circumstances, the labelling of the dataset is quite high since it needs human expert opinion to perform the labelling. Thus, when the majority of the observation does not require labelling of data but a few of them, semi-supervised learning deems to be the suitable algorithm for model construction (Hussain et al. 2020).
Reinforcement machine learning is a type of learning that is usually employed in the gaming environment. In this form of learning, the algorithm learns based on the interaction with its environment (similar to human interaction) by executing an action that increases the overall feedback (Mnih et al. 2015). However, the feedback might be a return that relies on the performing task output. In reinforcement learning, there is no initial action for any task to be performed while the algorithm utilizes trial and error methods. Thus, the learning agent can recognize and implement the best method from its experience to obtain the best reward based on trial and error.
A subset of machine learning algorithms also referred to as deep learning is another learning model usually employed in the implementation of BYOD security. Deep learning is a machine learning approach that comprises an architecture, which is centered on artificial neural networks (ANNs). Artificial neural networks are supervised deep learning algorithm that is stimulated by the brain. However, it does not imply that the ANNs work basically as the biological brain. The neural network consists of neurons (referred to as variables) connected via weighted connections (usually regarded as parameters). The network is connected with either a supervised or unsupervised learning approach to attain the desired performance results. The learning itself is performed by employing the labelled and unlabeled data respectively from the supervised and unsupervised learning approaches followed by the iteration modification of the weights among every pair of neurons. Thus, while describing deep learning, we refer to a larger neural network where the term deep denotes the number of that network layers (Yang et al. 2014). In the early times of artificial neural networks, it was hard to train the network because of the constraints in computational powers, even for relative networks. However, the advancement of technology has brought about more effective methods such as graphics processing units (GPUs) for estimating the optimal network weights, which permits the construction of larger networks containing more hidden layers. Even though it is not a severe rule, artificial neural networks that contain more than one hidden layer are regarded as deep learning models. Some deep learning models used in the BYOD implementation are convolutional neural networks, recurrent neural networks, autoencoders, etc.

Review of machine learning techniques for detecting and combating security threats and attacks in bring your own device (BYOD) environment
This section provides a review of machine learning techniques for security threats and attack implementation in the BYOD environment. The review is based on some aspects of BYOD machine learning implementations such as the dataset used, the machine learning algorithm employed, and the performance measures adopted. Specifically, this section is categorized into three major subheadings. Section 3.1.1.1 looks into the different datasets used in machine learning-based approach implementation for BYOD security threats and attacks. Section 3.1.3 discusses the various machine learning algorithms employed for the implementation of BYOD security threats and attacks while Sect. 3.1.3 reassesses the different performance evaluation metrics considered by different authors to assess the performance of machine learning implementation of BYOD security threats and attacks. This section followed the same concept used by (Christopher Ifeanyi Eke et al. 2019a, b). The summary of the review is shown in Table 4.

Review of datasets employed in machine learning algorithm for detecting and combating security threats and attacks in the BYOD environment
Learning models generally are based on the past occurrences or experiences of an event or scenario. This scenario is referred to a dataset which is a key element used to train, test, and implement BYOD security models. Thus, the first step in an attempt to implement machine learning techniques for detecting and combating security threats and attacks in the BYOD environment is dataset gathering. The findings as summarized in Table 4 demonstrate different datasets utilized to implement the machine learning approach for detecting and combating security threats and attacks in the BYOD environment. The selected study's analysis shows that datasets can be generally classified into homogeneous and heterogeneous data. When the author uses one type of dataset, it is termed a homogeneous dataset. On the other hand, when more than one type of dataset is used to perform machine learning for detecting and combating security threats and attacks in the BYOD environment, it is called heterogeneous data. Thus, the review of the datasets utilized according to the nature of the dataset used is explained as follows.

Homogeneous datasets
In a homogeneous dataset, the authors employed only one type of dataset. For instance, Shah and Shankarappa (2018) utilized a homogenous data source called the MDM events log. The MDM here is a scheme implemented in a BYOD environment to control and monitor the role of smartphones including their data operations. In a separate study, Chizoba et al. (2020), used homogenous data generated from network traffic logs when packets are transferred between networks. The network logs were used to implement the machine learning approach for BYOD security threats and attacks. Muhammad et al. (2017), leveraged the IAT (Inter-Arrival Times) packet data gotten from the local network of the Institute of Technology Georgia. The packet IAT data of 27 mobile devices such as tablets, laptops, and smartphones to evaluate device type profiling were collected using UDP, TCP, and ICMP protocols. The dataset is homogeneous in nature as it contains only the inter-arrival time of packets sent in a BYOD environment. In a related study Muhammad et al. (2019), employed a test-bed dataset carefully gathered via mobile devices without meddling. In another study, Petrov and Znati (2018), utilized the MIT dataset that is made up of 84 issues of phone event records such as call start time, incoming/outgoing direction including the type of calls (phone, data, or message call), etc. Eslahi et al. (2016), in their study on botnet detection, utilized a network traffic dataset generated from a mobile botnet. However, a data sieving approach is employed in the model to gather only the HTTP traffic records only during HTTP and server communication. In another research conducted by RIASAT et al. (2017), a publicly available android malware dataset gathered from the Contagio mobile was utilized. The data contain 600 samples composed of two segments (reptiles crawling and the malicious applications of the contagio library.

Heterogeneous datasets
In a heterogeneous dataset, the data are obtained from various sources. For instance, the study conducted by Arora and Bhatia (2019) utilized Table 4 The review summary of the machine learning approaches implementation for BYOD security threats and attacks In another study, Yerima et al. (2013), used 2000 samples of malware and a benign dataset out of which malware consists of 1000 and benign takes the other half of 1000 samples. The authors asserted that there is high variability in the malware samples than in the benign samples when 20 features were chosen. In a related study, Chen et al. (2016), employed two distinct dataset sources (benign and malware) which amount to 7970 samples. The benign sample is made up of 4350 while the malware sample constitutes 3620 samples. The analysis stating which dataset achieved a better result is not considered in the study. Similarly, Lashkari et al. (2017) in their proposed framework for android malware characterization and detection utilized both benign and malware datasets for machine learning classification to train the model. The author collected 1527 benign apps from the google play market between 2015 and 2016. The collection of these apps relies on how popular they are for each category present in the market. However, the author stated that 27 of the apps were eliminated before the modeling phase because they were classified as suspicious by the two different anti-virus products. On the other hand, 400 malware apps were collected based on two classes (adware, containing 250 apps, and general malware consisting of 150 apps). However, the adware category is consists of different families, including Airpush, Dowgin, Kemoge, Mobidash, and Shuanet. Finally, the author utilized Droidkin, which is a lightweight android apps similarity detector to find the relationship based on the category of each apps dataset (general malware, adware, and benign).

Review of machine learning algorithm for detecting and combating security threats and attacks in the BYOD environment
Based on these research findings, different machine-learning approaches have been employed in the implementation of BYOD security. A comprehensive summary of the findings is presented in Table 1, illustrating the different algorithms used by different researchers for implementing security threats and attacks in a BYOD environment. It is observed that certain authors use several algorithms, to determine which algorithm performs better. Consequently, Shah and Shankarappa (2018), employed multiple algorithms which include SVM, MLP, BN, and RF out of which the SVM algorithm outperformed the other three algorithms returning low false positive, low true negative, and highest performance accuracy. Thus, SVM stands tall in terms of BYOD security threats and attack implementation based on their findings. Chizoba et al. (2020) utilized SVM, DT, RF, and ensemble algorithms. Ensemble learning is used to combine the performance of the other three individual algorithms. However, the RF algorithm put up the best vote using the ensemble combination model. Similarly, Naive Bayes, RF, and SVM algorithms were adopted by Sokolova et al. (2017), for anomaly detection in BYOD environments. The authors reported only the results achieved using the NB model because it performed far better than the other schemes. Muhammad et al. (2017) modelled an intelligent filtering approach for BYOD security using K-means to isolate incidents towards uncovering different clusters of normal behaviours from abnormal behaviours in a BYOD environment. In a related study, by Muhammad et al. (2019), the author leveraged the Clustering-based Multivariate Gaussian Outlier Score (CMGOS) to identify irregular device behaviours. CMGOS constitute clustering and density approximation schemes. In clustering, the K-means algorithm was employed the while the density approximation used a multivariate Gaussian algorithm. The K-means scheme recorded some inconsistencies in the result. Hence, k-means was used to organize the limits (Centroid 1 and 2) and the outcome serves as input to the density approximation to implement the model. To control unauthorized access in the BYOD environment, Petrov and Znati (2018), laid hold on the artificial neural networks and decision tree algorithms to detect any un-authorize effort to get into delicate information by adversaries. In addition, the model further perplex or confuse their access to secure the data. Eslahi et al. (2016), leveraged the J48 form of Decision Tree (DT) to categorize the data and hence analyze the network behaviour. The J48 DT can proficiently detect recurring events in a mobile HTTP Botnet. Yerima et al. (2013), employed the Naïve Bayes classifier to identify malware in android devices. The authors noted that the Bayesian model is capable of performing both expert and learning schemes much better than other learning algorithms. In another study, Chen et al. (2016), used multiple learning algorithms including SVM, DT, ANN, NB, K-NN, and Bagging predictor to detect malware in an android environment. The performance result indicates that the KNN algorithm outperformed the other learning algorithms. RIASAT et al. (2017), adopted the use of SVM and random forest learning models to detect the behaviour of android malware. The authors noted with experimental facts that the RF algorithm produced a better result as compared to the SVM algorithm given the same processing time interval. The K-Nearest Neighbors algorithm was employed in the (Gangwal and Conti 2019) study for categorizing time series to detect crypto converts in mobile environments. The model operates with or without access rights to the suspicious gadget.

Review of evaluation metrics that are employed to evaluate the performance of machine learning algorithm for detecting and combating security threats and attacks in the BYOD environment
Performance measures are the metrics that are used to evaluate the performance of machine learning classification on BYOD security threats and attacks. The authors employed several evaluation metrics to ascertain the performance of BYOD models. Performance metrics such as accuracy, precision, recall, F-score, etc. were employed by researchers to evaluate the performance of the machine learning model on the BYOD security implementation. These metrics can be calculated by employing the values of false positive (FP), false negative (FN), true positive (TP), and true negative (TN), which constitute the components of the confusion matrix. The option of evaluation metric to be selected depends on the researcher's aim and expertise. In this regard, Yerima et al. (2013), utilized several metrics including false negative, true positive, false positive, true negative, precision as well as accuracy and error rate. These metrics were used to measure the performance of the model at different folds and the authors maintained that 15 to 20 features can provide good performance. In another study, Eslahi et al. (2016), used the accuracy, detection rate, and false alarm evaluation metrics to assess the performance of the model. The metrics yielded 98.60, 96.35, and 1.25% results respectively. In a separate study, Chen et al. (2016), leveraged the true positives, false positives, ROC, precision, recall, and accuracy metrics to ascertain how well the model can detect malware in an android environment to assess the performance of the Android malware detection model. Aneja et al. (2018), utilized the accuracy evaluation metric to assess the performance of the model, which showed an overall accuracy of 86.7%. The study by Daniel et al. 2018, utilized recall, precision, and accuracy for evaluating the performance of the model. The model returns a reliable accuracy/precision performance result of over 99% at each run time. Shah and Shankarappa (2018) used TP, TN, FP, FN, and Accuracy metrics to assess the performance of the BYOD security model developed. In a separate study, Sokolova et al. (2017) relied on the true positives, false positives, false negatives and true negatives evaluation metrics to assess the performance of the BYOD security model built. Muhammad et al. (2019), leveraged outlier secure accuracy to ascertain the performance of its BYOD scheme. The performance result shows that for 9; 100; and 324 IAT points, 99.3%, and 0.7% outlier secure accuracy was achieved in normal and abnormal profiling respectively. In the same year, Arora and Bhatia (2019), employed the use of performance metrics such as false acceptance rate, false rejection rate, accuracy, and average classification error. For each evaluation metric and dataset, a corresponding performance result was achieved. Similarly, Standard evaluation schemes such as accuracy, precision, recall, and f1-score were adopted in the study conducted by (Gangwal and Conti 2019) to assess the performance of a BYOD model. Precision and f-measure metrics yielded an average of 88 and 87 percent accuracy respectively. Accordingly, Chizoba et al. (2020), in their study to identify advanced persistent threats using ensemble classifiers, employed several evaluation metrics such as true positive, false positive, precision and recall, others include f1-score, MCC, ROC and PRC. The authors used all these aforementioned metrics to carefully assess the performance of the developed BYOD security scheme. Based on some of the reviewed studies, show that most of the related studies employed accuracy, recall, precision, and f-measure to evaluate the performance of the machine learning model. However, employing only such metrics may not be enough due to the imbalance in the dataset in some cases. Thus, the best metric to evaluate the model in such an instance is AUC.

Results
The systematic mapping results and discussion on the machine learning-based technique for detecting and combating security threats and attacks in the BYOD environment is provided in this section. Table 4 depicts the list of the included primary studies in this research. However, a total of 40 articles were finally selected in this research, by considering published articles in the year, ranging between 2012 and 2021. Five academic databases were used to produce the primary studies, including ACM Digital Library (4), IEEE Xplore (21), Springer (7), Google Scholar (6), and Science Direct (2). Based on the analysis of the paper, it can be seen that IEEE produced the majority of the articles. Thus, the results of the mapping, which are grouped according to the formulated research questions (RQ1 to RQ4) are presented below.

RQ1
What are the potential security threats and attacks that are found in the BYOD environment?
To answer this question, Table 5 provides a summary of the existing security threats and attacks that have been implemented using the machine learning approaches in a BYOD environment. Based on the table, this study identified 12 threats and attacks in the BYOD environment that have been implemented using machine learning approaches. The percentage distribution of the selected studies is depicted in Fig. 6. The analysis and the demonstration in Fig. 6 show that the most security threats and attacks that have been implemented in the existing studies are malware in 23 studies (57.5%) and unauthorized access in 6 studies (15%) out of the 40 selected studies. On the other hand, spoofing attacks, botnet, and user privacy breaches are represented in 2 studies with 5% each. Conversely,  User privacy breaches Shah and Shankarappa (2018) Fig. 7 shows the bubble plot of the security threats and attacks implemented using the machine learning approaches in the BYOD environment. In the plot, X-axis represents the security threats and attacks while Y-axis represents the year. It can be seen in the plot that malware and unauthorized access are gaining attention and dominance in the research domain. Research in malware attacks ascended in the year 2016 but later descended in the year 2020. However, much work has not been implemented using the machine learning approaches on the other threats and attacks such as persistent threats, data leakage, data theft, DOS attack, intrusion attack, and untrusted network in the BYOD environment.

RQ2
What are the different machine learning algorithms employed for detecting and combating security threats and attacks in BYOD environments?
Due to the limitation of the conventional approach to security threats and attacks in the BYOD environment, machine learning aspects of artificial intelligence have been employed by previous researchers to mitigate the security threats and attacks in the BYOD environment. This study identified 14 machine learning approaches that have been employed to mitigate security threats and attacks, which include SVM, DT, RF, ANN, KNN, DNN, NB, BN, LR, GDA, LDC, RT, Clustering, and Ensemble. Figure 8 illustrated the percentage distribution of the machine learning approaches that have been implemented by previous researchers for security threats and attacks. However, it can be observed from Fig. 8 that the most employed machine learning approaches supervised machine learning algorithms that comprised of SVM, DT, and RF, with a percentage distribution of 20%, 14%, and 15%  (Table 6). Moreover, the bubble plot of the machine learning approaches that shows the research trend based on the analysis of the selected studies is depicted in Fig. 9. In the plot, Y-axis depicts the machine learning approaches whereas X-axis shows the year of the studies. However, it is obvious from the plot that the research trend in machine learning approaches such as SVM, DT, and RF started ascending in the year 2016 as it gained more attention in the years 2016, 2017, and 2018, but began to descend in the year 2019. Conversely, DNN and clustering approach maintained a consistent trend between 2016 and 2019, and both approaches were proposed (2021). Thus, the research domain is currently active and will thrive in the years to come RQ3: What datasets do researchers employ in machine learning algorithms for detecting and combating security threats and attacks in the BYOD environment?
To answer the RQ3, this study identified and classified the datasets utilized in the selected studies into 14. Table 7 illustrates the identified datasets and the corresponding studies that used them. Figure 10 depicts the percentage distributions of various datasets used in the selected studies. It is obvious from Fig. 10 that malware samples and benign apps with 27% and network traffic data with 15% are the most used datasets in the selected studies. In addition, the second most used datasets are APK files with 10% and publicly available datasets with 10% on the selected studies. However, the presence of the publically available datasets shows that some research in the research domain did not collect their own datasets by themselves for the experiments but rather, utilized the publicly available datasets such as the NSL-KDD dataset, UNBISSCX, etc. (Table 4 provides the details). Figure 10 also shows that datasets such as Apache Spark, phone records, email data, IoT  Fig. 8 Machine learning approaches Percentage distribution for security mitigation node data, smartphone sensor data, URL session, and mobile botnet data are the least used datasets in the selected studies with 3% each.
Moreover, Fig. 11 illustrates the bubble plot of the used datasets in the selected studies. In the plot, X-axis represents the year of the studies whereas the Y-axis represents the used datasets. However, the plot shows that there is an increase in the number of studies on the utilization of malware sample and benign apps, network traffics, APK files, publically available datasets, and biometric sample data as they gained researchers' attention between the year 2016 and 2020. In contrast, there are fewer studies on the utilization of email data, phone record data, and Apache Spark data.

RQ4
What are the evaluation metrics that are employed to evaluate the performance of machine learning algorithms for detecting and combating security threats and attacks in the BYOD environment?
Evaluation metrics are the performance measure used to evaluate the performance of the machine learning approaches implementation of the BYOD security threats and attacks. The bubble plot of the machine learning approaches implementation for security mitigation Table 6 Machine learning approaches implementation for BYOD security threats and attacks S. No. Phone record data Petrov and Znati (2018) 14 Apache spark Kyriazis (2018) As can be seen in Table 8 of this study, 16 evaluation metrics were identified, which include ACC, PRE, REC, F-M, AUC, FPR, TPR, FAR, FRR, ACE, FA, TNR, FNR, ERR, RMSE, and Kappa statistics. Figure 12 demonstrates the percentage distributions of the research on the evaluation metrics based on the selected primary studies. As can be seen from Fig. 12, the studies focused more on the metrics such as ACC, PRE, REC, F-M, and AUC with a percentage of 31%, 17%, 16%, 13%, and 7% respectively. Out of the five metrics, ACC stands out as it has the highest percentage distribution. These metrics are gaining researchers' attention in the domain. It should also be noted that ACE, FA, TNR, FNR, ERR, RMSE, AND Kappa statistics have a very low percentage score of 1% each, which shows that those metrics have equal attention according to the selected studies. Figure 13 also represents the bubble plot of evaluation metrics for evaluating the machine learning approaches implementation of the selected primary studies in the BYOD environment. In the plot, Y-axis represents the evaluation metrics and X-axis represents the years. As demonstrated in Fig. 13, there is an increase in attention towards the ACC, PRE, REC, F-M; AUC beginning in the year 2016 as the trend keeps increasing with the increase in the number of publications. However, the research trend in the domain started losing attention in the year 2020, as the publication trend began to decline.

Discussions of findings from the systematic mapping review
This SMR adopted both Peterson and Kitchenham guidelines for comprehensive results in providing an overview of the study on the machine learning approach for mitigating the security threats and attacks in the BYOD environment. In this study, a sum of 753 articles published from 2012 to 2021 was initially obtained. However, after the screening process, a total of 40 primary studies were finally selected for SMR. The SMR was conducted by answering four research questions under four classification schemes, which include threats and attacks, machine learning approaches, datasets used, and evaluation metrics.

Threats and attacks
In threats and attacks, the results of this SMR indicate that the malware attack is gaining considerable attention in the BYOD environment with 65% of the selected primary studies. Malware is a terminology used to identify an application that executes intentional malicious payloads on a target device or network (Aslan and Samet 2020). Malware attack denotes malicious applications that can attack corporate applications as well as mobile devices. It consists of applications that contain code that breaches the mobile device or data security. There has been a rapid increase in mobile malware since 2011 (Chang et al. 2014). This shows that malware is regarded as the most hazardous threat targeted at corporate data. In a BYOD environment, malware manipulates employees' mobile devices and uses them to steal confidential data (de las Cuevas et al. 2015) and direct banking transactions (Romer 2014). When malware affects a device, it will lead to the exposure of confidential data, granting the attacker an opportunity to gain corporate identity. Besides the compromise on the employees' devices, mobile malware also attacks corporate applications by making them useless. It usually behaves as a potential corporate application that has been embedded with malicious code. There are several types of malware, namely,   Akhuseyinoglu and Akhuseyinoglu (2016) 1 3 worms, viruses, ransomware, rootkit, and Trojans (Aslan and Samet 2020). Malware can cause damage to an institution through activities such as the theft of confidential information and impersonating the corporation (Olalere et al. 2015). Malware installed on BYOD devices can bypass conventional security mechanisms while communicating with external nodes. The work in (Olalere et al. 2015) observed that malware is disguised as normal applications that have hidden malicious code which infects devices when users visit compromised sites. In addition, the result of our mapping shows that those issues are relatively dominant in the research domain. On the other hand, the persistent threats, data leakage, data theft, DOS attack, intrusion attack, untrusted networks, and computational overheads are having a 2% distribution each in the selected studies, which shows less attention in the study field.

Machine learning algorithm implementation
The study also indicated the machine learning algorithm implementation for security threats and attacks in the BYOD environment. However, it was found that SVM, DT, RF, ANN, AND KNN are the most considered machine learning algorithm for security threats and attacks in the BYOD environment. SVM, DT, RF, ANN, AND KNN are all supervised learning algorithms. Thus, the result shows that supervised learning algorithms are mostly used for detecting and combating security threats and attacks in the BYOD environment. The most effective machine learning algorithm is supervised learning, which uses trained data to help the system learn and allows the output to be categorized according to the input. In supervised learning, the model receives labelled training data and uses that information to learn how to classify incoming observations (Eke et al. 2019a, b). The algorithm learns from the available training data and uses its application on real data. SVM is a supervised learning model that constructs a classification model by employing the learning theory of statistics. The classification task needs the splitting of the data into the training and test set. In SVM, a hyper-plane, which also refers to as a support vector is employed for separating two-class data points by using the training sets to reduce the space between them (Cristianini and Shawe-Taylor 2000). A DT is an instance-focused induced learning algorithm. Instance classification is trained using decision trees, which may categorize instances based on the specific attribute occurrence of the value sets. One of a decision tree algorithm's flaws is the over-fitting problem. Utilizing several classifiers, such as the random forest, which divides the training set into different trees to be created and trained, then combining the final predictions over the tree, can, however, solve this issue . RF on the other hand is an ensemble learning model that uses sub-training sets to construct the decision tree algorithm. Consequently, the decision tree classifies every input vector in a forest and the most predicted model is chosen. RF addresses the overfitting issue and it attains better prediction over a single decision tree (Fernández-Delgado et al. 2014). A neural network is a learning model that has the same characteristic that performs the same nerve system function found in the human brain. An ANN is made up of three different layers; the input layer, the hidden layer, and the output layer. The input and the hidden layers are made up of various nodes, whereas the output layer contains just one node (Christopher Ifeanyi Eke et al. 2019a, b). In the neural network, there is a connection between each unit and other units of the network, which possesses a summation function that integrates all the input values. K-NN is another supervised learning model that is used to solve both regression and classification tasks. The learning model relies on the similarity measure for data classification by employing majority neighbor vote. Thus, the allocation of data to the class is established by the highest nearest neighbor and the increment in the nearest neighbor enhances the classification accuracy. The analysis of the SMR showed that these learning algorithms attained a total of 68% of the selected primary studies, which shows an increase in the attention of those machine learning approach for security mitigation mechanisms in the field. Conversely, GDA, LDC, Ensemble, LR, BN, and RT are having a percentage distribution of less than 20% of the selected studies, which shows less research and a decrease in the consideration in the domain.

Datasets Utilization
In datasets utilization for the machine learning approaches in detecting and combating the security threats and attacks in the BYOD environment, the systematic mapping result identified and classified 14 categories of datasets in the selected primary studies. The review findings show that the datasets for security threats and attacks based on machine learning implementation in the BYOD environment could be homogeneous data or heterogeneous data. The researchers have the option of collecting their own dataset or utilizing the publicly available datasets. However, many studies utilized the publicly available dataset. The review also indicated that the most used dataset from the selected studies is Malware and Benign samples dataset. The analysis of the data shows that there is an increase in the number of studies that utilized malware samples and benign apps, network traffics, APK files, publically available datasets, and biometric sample data as they obtained the highest percentage distribution of 69% of the selected studies. However, this shows that those datasets gained researchers' attention in the domain, more especially from the year 2016 to 2020 as the analysis showed a rise in the trend in those years. In contrast, there are fewer studies on the utilization of email data, phone record data, and Apache Spark data.

Performance metrics
The study also provided the results of the evaluation metrics employed to evaluate the performance of the machine learning approaches in detecting and combating security threats and attacks in BYOD environments. The result of the analysis shows that ACC, REC, PRE, and F-M were mostly used in the selected primary studies. However, these measures might not be sufficient to accurately assess the performance of the classifier. This is due to the class imbalance that is mostly seen across different datasets in the chosen research. Due to its applicability in measuring the classification performance related to a specific class, AUC would be the ideal choice in this circumstance (Provost and Fawcett 1997;Provost et al. 1998). Additionally, compared to F-Measure, AUC has great resilience to the skewness in datasets by applying TPR. Thus, the findings from the study analysis indicated that ACC, REC, PRE, and F-M were utilized in 77% of the selected primary studies, showing that those metrics are standard metrics that most researchers use for a reliable performance evaluation of machine learning approaches implementation for security threats and attacks in the BYOD environment. In contrast, less than 30% of the selected studies utilized AUC, FPR, TPR, FAR, FRR, ACE, FA, TNR, FNR, ERR, RMSE, and Kappa statistics as evaluation metrics in the domain, which shows that they are not commonly used for machine learning approaches evaluation of security threats and attacks in the BYOD environment.

Gaps in the current research trend based on the findings
The research gaps based on the finding from the current research is provided as follows.
• In security threats and attacks, persistent threats, data leakage, data theft, DOS attack, intrusion attack, untrusted networks, and computational overheads are having less attention in the study field, which serves as a gap for the researcher that wants to conduct research in the domain. • In machine learning algorithms, it was found that most employed algorithms are supervised learning, and other algorithms such as GDA, LDC, Ensemble, LR, BN, and RT had less attention, which serves as a research gap. Thus, new researchers in the field can focus on those algorithms. Generally, unsupervised and semi-supervised, and reinforcement learning investigations are needed in this research domain. • In the dataset, few studies utilized publicly available datasets while some researchers collected their own datasets. The research gap here is that there is a need for more publicly available datasets for researchers in this field of study. • In performance metrics, there is a gap in utilizing FPR, TPR, FAR, FRR, ACE, FA, TNR, FNR, ERR, RMSE, and Kappa statistics for evaluating machine learning performance in BYOD security implementation. Thus, evaluation with such metrics is required in further research.

Threats to validity
This systematic mapping study process is not reliable just like other secondary study methods but is also subjected to some validity threats. Several risks are required to be considered to ensure the validity of this systematic mapping review. This research segment describes threats and their mitigation mechanisms, which include the search criteria, online databases, and selection criteria (inclusion and exclusion) (O'donovan et al. 2015).

The search criteria
To search for academic databases, the key attention is focused on the definition of valid search strings. Thus, the formulation of the search string serves as a threat to the validity of this research [100]. To alleviate the threat, this study's search string was created by employing the PICO criteria [30], which is the standard and is mostly used in systematic mapping studies. Thus, it helped in retrieving the required articles in the search result and to alleviate the threat.

Digital databases
The chosen databases that consist of ACM Digital Library, IEEE Xplore, Springer, Google Scholar, and Science Direct stand as a threat to the validity because the related studies would have been absent in those databases. We could have overlooked studies that were available in other databases but did not turn up in our Google Scholar searches. This search approach might have an effect on the size of our sample, leading to threats to the validity. To lessen selection bias throughout the paper selection procedure, we used a doubleblind screening approach. However, this technique leaves some bias behind, which could endanger validity internally. In order to perform a theme analysis on the chosen papers, we adhered to Cruzes et al. (2011) recommendations. Our method of coding and topic generation was inductive and reflexive, which is thought to lessen the influence of individual opinions on the analysis (Alabood et al. 2022;Braun et al. 2019). Thus, in this study, the author selected five databases that included ACM digital library, IEEE Xplore, Springer, Google Scholar, and Science Direct that will alleviate the threat.

Inclusion and exclusion criteria
In this study, the instruction and the requirement of selection criteria are demonstrated based on the scope of this study. The criteria were formulated based on the research team discussion. However, formulating a rule to identify the primary literature to review signifies the presence of a threat that may be overlooked by the relevance search when different terms to that criterion are employed. Nevertheless, the search terms utilized in this study, consisting of bring your own device, machine learning approach, security threat, and attacks, are conventional, well-accepted, and defined terms, which should reduce the count of unrelated studies. Furthermore, as the research focused on the machine learning approach for BYOD security threats and attacks, there is less concern with including studies that are loosely related to the field. Thus, this form of threat can be avoided by ensuring that our search strategy produces relevant studies by refining the query iteratively.

Concluding remarks and future direction
The advancement in mobile computing has made researchers have an interest in one of the innovative, ubiquitous paradigms called bring your own device (BYOD) (Ballagas et al. 2004). BYOD paradigm enables employees to come with their own mobile devices and join the organization networks to enhance flexibility, productivity, and mobility on the side of the employees. Despite the numerous benefits that BYOD offers to both employees and organizations, security issues remain the major challenge in organizational settings. A considerable number of studies have been conducted and published on BYOD with great interest in security threats and mitigation mechanisms. However, the detailed review of the security solution mechanisms is not emphasized. Moreover, none of the existing reviews focused on the application of the machine learning approach, as a mitigation mechanism for the security threats and attacks in the BYOD environment. Besides, none of the existing studies demonstrated the ongoing trends in the domain, the datasets utilized for the implementation, and the evaluation metrics employed for the performance evaluations of the approaches in the existing solution.
This study presents a comprehensive systematic mapping review by highlighting the current trends in the implementation of machine learning approaches for mitigating security threats and attacks in the BYOD environment based on the existing primary studies published between 2012 and 2021. The SMR study was carried out by addressing four research questions. Out of the 753 studies that were initially retrieved, 40 primary studies were selected from 5 different academic databases after undergoing the selection criteria process. The SMR was conducted on the primary studies by exploring the existing security threats and attacks on the BYOD environment. Moreover, the machine learning approach implementation and the evaluation metrics employed to evaluate the performance of the machine learning approaches were demonstrated. In addition, the datasets that are used to implement the machine learning models were identified and reviewed.
The SMR result demonstrates the rise in the number of investigations into malware and unauthorized access in relation to the existing security threats and attacks in the BYOD environment. Moreover, concerning the machine learning approaches for mitigation mechanisms for security threats and attacks in the BYOD environment, the mapping result shows that supervised learning approaches such as SVM, DT, and RF gained many researchers' attention. However, the investigation also shows that there is a research gap in other machine learning approaches such as ensemble learning, clustering, LR, NB, BN, and DNN since they have not received much recognition in the domain. In addition, the study also indicates that there is a need for comprehensive publicly available datasets for the implementation of machine learning-based solutions since most researchers collect their own datasets, which can be a tedious task. Moreover, four standard performance evaluation metrics have been extensively used by researchers to evaluate models. However, researchers in the domain can also consider other metrics to evaluate models. Thus, the SMR has set the pace for creating new ground research in the machine learning-based approach for implementing the BYOD environment, which will offer invaluable insight into the study field. Therefore, researchers can employ this SMR to find a research gap in the research domain. In the future, the authors will consider the following research.
• Malware classification for Apple, Android, and Unix platforms, and integrating it with mail services like Gmail, using malware publicly available datasets • Employing more enriched meta-data features, combining graph kernel with a set of kernel and application of semi-supervised machine learning approach for implementation. • Training the neural network with large datasets to separate normal and abnormal devices by employing ping, Skype, and iperf applications with TCP, UDP, and ICMP protocols. • Investigation of other classification approaches together with the implementation of the Genetic programming approach to address the problem of imbalance by using a modification of the cost-related misclassification

Lesson learned
Lessons obtained in conducting systematic literature and mapping reviews cannot be overemphasized. This subsection outlined the lessons that we have learned in conducting this systematic mapping review. The lessons are outlined below: • Ability to thoroughly search literature on databases and identify the suitable papers • Statistical data analysis such as bubble plots, which we do not have knowledge of before. • How to perform a quality assessment on literature as well as a screening process to identify the potential literature. • Drawing skills by learning some drawing tools such as draw.io • Our writing skills were also improved as we took part in literature writing.