Mobile App Usage Pattern Prediction Using Hierarchical Flexi-Ensemble Clustering (HFEC) for Mobile Service Rating

Nowadays, the mobile app market becomes rapidly increased in world wide. The mobile app marketers have smart enough to understand the requirements and demands of customers and perform their aspirations. They delight them. It provides growth, profitability, and creativity with lot of inventions. The main aim of this research is to analyze the customer interest and preferences of mobile service providers. This paper proposed the clustering model named as Hierarchical Flexi-Ensemble Clustering. It provides the final result with robustness and improved quality. Before clustering, the unwanted features are removed by using the Genetic Algorithm based on the Collective Materials technique. The customer preferences are analyzes with the clustering of mobile usage patterns. The analysis determined that the app usage pattern based on the most frequent word, rating category, rating character count, rating word count and content-based rating in the google play store app dataset. Finally, the results are compared with the existing methods to analyze the superior performance of proposed method. The comparison analysis is estimated based on the based on the average hit rate at different cache sizes. The work is concluded with the app pattern prediction in the form of clustering for app marketing service. From the marketing side, they can analyze the customer preferences and satisfaction.


Introduction
The smartphone becomes human comrades due to its flexible services and the number of increased mobile applications accelerates the rate of smartphone adoption. The available list of apps are displayed to the user based on user rating and app name when the user searches A. R. Nadira Banu Kamal-deceased. for the google play apps. At the end of 2018, there have been more than 5.6 million applications present in the Google play store and Apple play store. Furthermore, the increased number of mobile apps will enhance installs or downloads. Generally, the users like to download the apps which are the highly-rated apps because it rectifies the other user's satisfaction than the rest of the applications. The number of popular apps had billions of active users and downloads. With these million number of popular apps, one of the important challenges is to catch the focus of users. To increase the quality of the app and users' high rating, the companies and developers utilized some strategies and techniques such as (1) directly allowing users to provide the five-star rating, (2) describing free text formation, (3) request new features, etc.…The app store market contains some characteristics such as books, music, entertainment, games, and movies, and so on. Also, they had the opportunity to make the changes and add the new features in the new release app based on the user's reviews and feedback ratings. In the mobile app market, the developers involved accessing the crowd sourced information about their developed apps such as app reviews, app usage, ratings, and app relevant posts. To increase the partialities of mobile users, apps quality, and user experiences, the developers should understand how people manage mobile applications in day by day [1,2]. Various researchers analyzed the app usage pattern which is based on the data traffic, number of unique users, Number of downloads, and length of network access. [3] suggested the correlation analysis to derive the app usage pattern. The correlation between app usage and device model was derived and it's important to determine the various users' behavior. [4] analyzed the temporal behavior of users with mental scale analysis. The clustering method was used for characterizing the archetypical user engagement. [5] introduced the Spatio-temporal event detection approach and then the clustering is performed based on contextual data. It enhances the retrieval information by mitigating searching space and searching of related clusters. [6] demonstrated the matrix factorization method to detect the app usage pattern and user intents.
Data mining involves the mining of app usage information in two pleats. Firstly, it learns the app usage pattern of users than can improve the quality of the app. Secondly, it extracts the user profiles and suggests high relevantapps to users. The widely used data mining techniques are sequential pattern discovery, association discovery, clustering, classification, and forecasting. The app marketing can be enhanced with the mining of user feedback, this analysis helped to learn how the users think about the app and their recommendations to develop the app. With the usage pattern, the app store market recommends apps based on their preferences to users. [7,8] focused to analyze the mobile user's preferences of over mining app management activities. Then they depicted the behavioral pattern from those activities for detecting more accurate user preferences.
The existing methodologies contain the following limitations: • Inefficient of prediction results • Most of the prediction system only consider the location • More time consumption The aim of this research is to identify customer preferences and satisfaction in the form of mobile app usage patterns. To achieve this the proposed system utilized the clustering strategy in a highly efficient way. This system is used to understand the development and customer preferences of various mobile service providers. Additionally, we have analyzed the app usage pattern analysis in terms of most frequent words, rating category, rating character count, rating word count, and content-based rating. We developed the implementation to cluster customer satisfaction based on their preferences. Here, the clustering of the accuracy level is enhanced by using the similarity index and merge score. The time taken for determining the cluster position is very less. The expected outcomes of this research will be the patterns to get the satisfaction of customers for any mobile services. The motivation of this research is defined as below, • To achieve efficient feature selection by a Genetic Algorithm based on the Collective Materials (GACM) approach. • To evaluate the app usage pattern by Hierarchical Flexi-Ensemble Clustering (HFEC) technique. • To increase the efficiency of the clustering method by novel similarity indexing formula. • To effectively cluster the customer satisfaction based on analysis of mobile app usage pattern.

Organization of the Paper
The remaining portion of this paper is scheduled as follows: Sect. 2 demonstrates the various clustering methods to determine the mobile application pattern. Section 3 illuminates about the proposed Hierarchical Flexi-Ensemble Clustering (HFEC) technique for determining the mobile application usage pattern in google play store. Section 4 exhibits the performance measures of the proposed system and Sect. 5 concludes the proposed work.

Review of Literature Work
This section described the various existing works of clustering methods to determine the mobile application pattern. Also, we discussed the popularity prediction of mobile apps with their behavior. Besides [9] prepared the crowd listening method for the release planning of mobile apps. The mobile apps marketing becomes bigger and it is anticipated to grow over 100 billion dollars in 2020. To achieve this successful development, the developers and competitive environment require to create and manage the high-quality apps with its new features. The application marketplaces allow users to provide reviews and comments. These reviews aim to achieve the recommending apps between enormous users and also provide useful information for developers and it acts the precious information for reporting failures and proposing new features. The developers can able to access the users to review manually to analyses the source of information. This research proposed the CLAP (Crowd Listener for release Planning) technique to assist the developers to achieve this task based on the web application with three categories such as, 1. Characterizes the user reviews based on their comment information 2. Clustering based on their related reviews. 3. Prioritizing of cluster reviews They evaluated the various steps in the CLAP process and it shows the highest accuracy in characterizing and clustering reviews. This working tool applied to industrial environments applications [10].
Initially this [11] study suggested the popularity prediction model for understanding mobile application usage. The highest growth of network services and smart devices drive mobile application development. Some of the popular applications enhance the network capacity and user experiences such as BSs, Apps, etc.… However, that is critical to analyses the app usage pattern and traffic consumption through BSs in the metropolitan area. Through the network interfaces, the mobile big data was collected and make it easy for the data-driven approach in features characterizing. Based on characteristics of app, traffic, generated logs, points of interest, edge catching approach has been established. Different temporal characteristics have been examined. Different BS clusters' traffic and logs have been analysed. In certain period top N popular apps have been predicted based on various clusters. To the network operators, it is more advantageous which evaluates the various apps traffic distribution.
Further this [12] research utilized the feature extraction process for analyzing important features. Google play store platform allows giving their reviews about the app. The user reviews and comments help to retrieve the requirements of potential apps. There were various researchers mentioned the mobile app feature extraction in user reviews. The important feature extraction is the most important challenge and to recover this issue, this research proposed finding collocation for extracting the mobile app features in reviews and extracting the infrequent features from reviews based on the rules of extraction. Then, it compared the similarity of overall retrieved features from reviews with app descriptions. The similarity measurement technique contains the similarity of 1. Single term by the corresponding feature of each term, 2. Synonym refers to WordNet synsets, 3. Sentence based on an estimation of cosine similarity and lexical-semantic vector. The results were proved with the performance metrics of precision and recall.
However [13] proposed the fine-grained mobile app clustering model with Retrofitted Document Embedding. To analyses the clusters automatically with no predefined categories this research initializes the clusters in terms of tittle keywords and then combines the similar clusters. The proposed method differentiates the accurate clustering step with tittles to improve the clustering model performance. In the result section, the tagged set was evaluated when the processing of the accurate clustering step. This tagged set was further used for learning a high-performance document vector. The evaluation results that the proposed method of performances which was decreased 1.18 entropy value and increased 0.19 accuracy value compared to k-means clustering and SVM algorithm.
Moreover [14] proposed the user behavior clustering algorithm for mobile applications. The analysis of user behavior clustering divides users into multiple groups. This research proposed the two collections of algorithms such as the fuzzy clustering algorithm and the two-layer clustering algorithm with user behavior data. The traditional first-layer clustering method develops the DBSCAN algorithm that exchanges the neighborhood radius with the designed similarity of user session and it has optimized clusters merge condition. Then retrieve the user sessions feature vectors and utilized the enhanced FCM algorithm. This method initializes the membership matrix to speed up the weighted value and convergence speed to solve the local optimum issues. The results of both algorithms provide effective performances and these clustering methods applied to various analysis scenarios.
This research [15] explicated the two-layer clustering method for mobile application analysis. This new framework provides a macro perspective on mobile CRM applications. The mobile application marketers used data analysis to achieve their company products and services to sell to targeted customers in terms of accurate marketing. They track the group structure changes periodically by using the established clustering model. This research only investigated the clustering with data usage behavior, mobile voice, customer base data, and customer contributions. It does not design to increase the grouping of customer variable selection function.
On the other hand [16] predicted the app's popularity with retention rates and trend filters. Commonly, the popularity of the mobile app measured by installations, downloads and user ratings. The challenge of these measures that they provide indirect usage. The retention rates define several users to continue with app installation that have been recommended to determine the app life-cycles. They conducted a large scale of usage trends and retention rates on app usage dataset from more than 213, 667 apps and 339,842 users. The analysis showed 65% of application loss of their users in the first week and 35% of application loss for the top 100 applications.
Likewise [17] Suggested the clustering-based mining textual features to provide the benefits of developers and users. In-app store, for the mobile apps of latent clustering they proposed the novel method for similarity calculation based on claimed behavior. In the proposed system, the features are extracted by using ontological analysis. The retrieved attributes were used to clustering the app with agglomerative hierarchical clustering. They evaluated the proposed method of 17,877 apps from Google app stores and BlackBerry in 2014. The developed system resulted in enhancing the existing categorization quality for blackberry from 0.02 to 0.41 and for google stores from 0.03 to 0.21. Correlation of strong spearman rank are identified as for blackberry = 0:99 and for google stores it is = 0:96 among the different apps.
Further [18] proposed the community-based diffusion method with spectral clustering and Markov chain for mobile social networks (MSN). The information exchange becomes an important challenge in emerging MSN. This research addressed the issue of determining the top-k influential users which means the users spread the information effectively in the network. To reduce the problem of the spreading period, this paper used the k-center problem that has NP-hard time complexity. In the end, they selected top-k influential patterns in each community. The NS-2 simulation performed the proposed method of results in MSNs.
This [19] article described a clickstream tool for modeling online user behavior. Online services are utterly dependent on user involvement. Either online social networks or crowd sharing sites, it is important but difficult to understand user behavior. For capturing the hierarchy among the user clusters, the iterative tuning has been influenced by partitioning approach and for user behaviour capturing and visualizing, intuitive functionality has produced.
Service providers can examine dominant user habits and classifications as a summary while visualizing fine-grained behavior patterns across each category with the aid of the visualization tool. The tool is capturing the unexpected or unknown behaviors since it is not focused on the user groups. On two greater scale online networks with respect to case studies, efficacy has been determined. From data science team, identification and resources have been shared and highly positive comments are obtained.
On the other hand [20] introduced a CHABADA approach to effectively predict the applications whose behavior was unexpectedly provided their description. Several instances of false misleading ads have been found, a new effective detector for existing unknown malware had been acquired as a side effect. For mining applications and empirical software engineering, software mining collections has been opened. For the automated natural language studies, new chances with specifications have been opened. The perceptions range has been gained by this model with respect to android app. For inaccurate or deceptive ads, Google can adopted better principles. User permission is disabled for the android users. The inconsistencies are focused and depicted which is observed easily by the CHABADA.
Similarly [17,21] Proposed a novel methodology that evaluates app similarity which is based on claimed behavior. While categorizing software systems concerning their 1 3 functionality leads too many advantages for both users as well as developers. Using information retrieval the specified characteristics have been extracted by the ontological analysis and apps attributes are described. Using agglomerative hierarchical clustering, the applications are clustered by those attributes. From google app store and blackberry, 17,877 applications are mined. The correlation among the blackberry and google with respect to size increased associated with proper granularity and number of apps. Among the finest granularity and mean scores to raters' positive correlation has been obtained. For allowing reverse engineering tasks in the domain analysis, the clustering and feature selection strategy has been explored. Finally various app stores clustering have been compared with various steps of similarity.
Furthermore [22] proposed a met heuristics-based clustering ensemble method. For clustering ensembles, this analysis performed an improved generation process and coassociation matrix in the co-occurrence method. The key component analysis is used to enhance efficiency. The main issue is the mobile application. The marketing strategy for the actual application is therefore based on the better outcome. This article [23] analyzed the customer satisfaction of mobile app by using data mining methods. Currently, various data mining techniques had been used to investigate customer data. This research showed that the number of data mining techniques to be applied in the analysis of customer satisfaction. The machine learning methods were applied with CRISP-DM methodology on the dataset. Some of the modeling techniques are Naïve Bayes, logistic regression and decision tree. The predictive performance was the most essential to analyses the customer like about the app. The results are achieved 90% accuracy and the feature selection methods improve the overall accuracy as well as negative class precision.
Similarly [24] provided data mining applications for pattern recognition. In telecommunication, mobile subscriber's effects from the data traffic every day. In network, the data traffic provides certain characteristics of behavior. The data mining applications help to analyses the data traffic features. This paper proposed the new technology in the form of exponential binning of data preprocessing to minimize the noise and smoothening of data. Then used the k-means algorithm for clustering the data traffic stream and mining the behavior characteristics of subscribers from clusters.
Moreover [25] analyzed the mobile application usage and predict location based on cluster. App usage prediction and smartphone user's location are important problems in the current researches. Some of the smartphone sensors are defined as GPS, accelerometer, gyroscope, microphone, camera, and bluetooth, which makes it easy to analyses the user behavior data for specific analysis. But the number of apps increases and user behavior differences predicted a challenging task. The proposed work conducted the dataset with 30,000 users from a leading IT company in China that converts the data into frequency, recency and monetary variables and finally performed clustering analysis to analyses the user behavior. For every cluster, the predicted models are developed by using the training dataset and testing dataset.
However [26] this research investigates the young children's real varied app on large aggregated Australian datasets in primary schools. The dataset contains 15,000 Android devices over three years. The association analysis and clustering analysis have been employed to analyses the usage app patterns in data mining methods. The evaluation results showed the five distinct app use of patterns. The various use patterns of implications were discussed about teaching and learning.
Besides [27] utilized the data clustering methods for predicting the temporal data characteristics in various mobile applications through wireless communications. The existing researches focused only on the analysis of mobile traces, and call records. This research concentrates on the usage of mobile applications to characterize and detect their behavior. They utilized the mobile application usage logs to characterize the mobile applications of temporal behavior in-network service provider. It showed that the utilization of classes to analyses the future usage of specific mobile applications via distance calculation and similarity comparison techniques.
In addition to that [28] provided the two popular clustering algorithms such as fuzzy c-means and k-means. Clustering is the most suitable method to identify the hidden groups in large datasets. The research conducted on a mobile app dataset with 7196 applications that were clustered by using proposed clustering methods. After the various techniques of pre-processing such as standardization and outlier removal, the clustering algorithms are run with various parameters to reach the highest performances and optimal values. The results proved that the fuzzy c-means algorithm provides the highest quality compared to the k-means algorithm. Further [29] determined the various applications of usage behavior through different kinds of smartphones. The traditional researches only analyzed the data from smartphone-based user reports, this research involves the analysis of elementary characteristics of smartphone users. The research conducted on the dataset with 106,762 Android users and determined 382 distinct types of users based on their usage behavior by using the feature ranking selection technique and two-step clustering technique.

Proposed Work
A novel method named Hierarchical Flexi-Ensemble Clustering (HFEC) for determining the mobile application usage pattern in google play store environment. Additionally, the novel feature selections strategy was utilized to select the more relevant features from a pre-processed dataset. There are various types of information present in app stores from the developer's side such as app description, app features, downloads, ratings, comments, etc.…The prediction of mobile app usage pattern is the most important for mobile service rating. As shown in Fig. 1 to achieve the proposed novel methods used in a highly efficient way with three types of a process named as • Pre-processing • Feature selection by using GACM • Pattern prediction by using HFEC

Pre-processing
The initial dataset contains missing values, failure to load, corrupt data, incomplete extraction, and uninformative parts and lots of noises. The pre-processing process defined as cleaning of text and data for further processing and it improves the quality of data. To improve the proposed method of performance, the noises should be reduced from the dataset. Initially, determine the missing values, null values, and noises from the dataset. Then handling the null values by dropping the rows or columns. To handle the missing values, we achieved from the calculation of mean, median of the feature and replace it with the missing values. The developed dataset comprised with app name, id, review, category, last updated version, android version, present version, price and type. For further processing, pre-processing on the dataset is required.

Feature Selection: Genetic Algorithm Based on Collective Materials (GACM)
After pre-processing the dataset, the feature selection process will be done by using a genetic algorithm that is based on collective materials. The dataset contains a large number of features and has to optimize the most important features. The independent dataset does not utilize to create the predictive model, so we required to minimizing the error of the model. Only the relevant features provide important information of output, as well as irrelevant features, consist of a minimum amount of information regards output. To achieve this more efficiently, the Genetic algorithm is used based on collective materials. The main advantage of this approach is, it can handle the large dataset with many features. The optimization problem solving is to determine those input features that consist of an enormous amount of information about output. To measure the important information of random variables, the entropy and collective materials are introduced in this research. Here, we Fig. 1 Overall Flow of the proposed system utilized a new formula for computing the conditional collective materials between the candidate feature and given a subset of features in the local search process. Generally, entropy measured the uncleared random variables. The joint entropy of two discrete random variables are defined as, The conditional entropy defined as From Fig. 2 the collective material defined as the common information of two random variables x and y. a large number of collective materials between two random variables are closely related. If the value of collective material is zero, then the two variables are unrelated. For discrete random variables, the entropy and collective material can be estimated as, The evaluation of collective material between the discrete variables defined as, Then the conditional collective materials are represented as, The local and global search of feature selection are done by maximizing collective material.
Calculation of cluster position by using GACM

Pattern Prediction: Hierarchical Flexi-Ensemble Clustering (HFEC)
The dataset contains the various characteristics of app usage patterns and the proposed clustering approach effectively supports the search of relevant patterns. Previously, the app usage pattern was analyzed based on the data traffic, number of unique users, number of downloads, and length of network access. In this research, the proposed clustering model analyzed the important variables or pattern prediction that influences the overall rating of the mobile application.
After the feature selection, the clustering is done by using the proposed Hierarchical Flexi-Ensemble Clustering (HFEC) method. Ensemble clustering defines the situation that the number of different (input) clustering has obtained for a particular dataset and it is requested to determine the single (consensus) clustering which is a better fit in some sense than the existing clustering. It provides the final result with robustness and improved quality. In this proposed method, there are multiple options present to structure the clustering process. Initially, we calculate the distance between two clusters and the cluster center is computed. Finally, the similarity between those clusters is calculated with the novel framework. We handle the clusters and ensemble members differently. Each ensemble contains a single instance and it is a part of one cluster at all times. The simple layout algorithm is employed to determine the cluster position or cluster center based on the upper limit and lower limit computation. The upper limit xhi and lower limit xlo are computed for each cluster. The layout is parameterized with vitals such as horizontal spacing, line thickness of ensemble member, minimum and maximum vertical spacing between clusters. Then the similarity matrix defined as the similarity between two data pints u and v. If the similarity between two data points is 1, that is assigned to the same cluster or else 0. The base clustering (A) for all data matrices defined as Y ′ and the similarity is calculated by, The proposed flexible ensemble clustering method solves the clustering problem and dendrogram selection problems.
Dendogram selection and clustering problem solved by the flexible ensemble clustering approach

Dataset Description
The proposed method is evaluated by using google play store apps dataset from Kaggle [30]. This dataset consists of 10,840 applications of ranking and reviews information. It contains multiple variables such as app name, app id, category, reviews, rating, No of installs, app size, app type, content rating, price, genres, last updated version, current version, and android version. The dataset variables are detailed as below Table 1.
The mobile application that is used nowadays can be divided into many categories such as art and design, family, comics, communication, game, maps, social, sports, shopping and so on. The Fig. 3 shows the rating given based on the different categories of application available to the users. Application are also installed by several number of users every day. The app with better review will be installed by many users. The Fig. 4 explains the installation count of application by the number of users.

Pattern Prediction Analysis
In pattern prediction analysis, the most frequent word, rating category, rating character count, rating word count and content-based rating were analyzed. The proposed cluster model clustered the mobile application pattern with important variables. The developers of various application are introduced to users by their names. Some of the commonly used word that appear in many application include app, mobile, live, pro, chat and many more. In Fig. 5, the count of most frequently used word in the application name among the various applications available in internet. From the figure, word app is the most frequently used by the application developers.
The Fig. 6 shows the rating of the application based on their character count. Character count obtained rating in the range between 3.5 and 4.5. Rating of 1 was given at rare times. In Fig. 7, the rating obtained based on the different category of application is shown. It explained the rating given by app users to the various category. Some of the category of apps like beauty, sports, education, shopping, maps and many more never got lowest rating. At the same time, some app categories like communication, medical, family, tools and few other obtained least rating. Figure 8 represents the comparison between the existing and proposed of first category with respect to the average rate of hit with respect to the separate sizes of cache. From the fig. N value represented the top N popular app type.
The average rate of hit changes with respect to the size of cache. The existing and the work proposed were compared and it is represented in the Fig. 9.
In Fig. 10, the third category of the proposed and existing are compared based on the average hit rate obtained according to various sizes of cache.
The average rate of hit based on the different sizes of cache for the fourth category are compared and it is shown in Fig. 11. Also, for fifth and sixth categories the graph are represented in the Figs. 12 and 13 respectively.  The models and method proposed were analyzed based on the most frequent word, character count ratings, ratings obtained for various categories are compared with graphical  representations. From the results compared, the proposed models and the methods proved to be better than the existing approaches [11].

Conclusion
In this research, the detailed analysis of mobile app using patterns is carried out in google play store apps. The new framework of Hierarchical Flexi-Ensemble Clustering i.e., HFEC is proposed to influence app ratings. The general algorithm of GA based on the collective materials were used to remove the unnecessary features before clustering process. A novel formula based on similarity indexing were used to enhance the efficiency of clustering. The performance of the model proposed were evaluated by the considering the dataset taken from google play store. By comparison of the proposed model with existing based on the  frequent word used in app, character count ratings, ratings given to various categories of app were illustrated with graph.
Later, considered about six categories for the proposed and the existing methods based on the average hit rate with respect to the different cache sizes are represented graphically. The main advantage of this approach is, it can handle the large dataset with many features. As a future work, sentiment analysis and image prediction can also be performed using highly efficient algorithms than the one used in this work.