Trilateral Spearman Katz Centrality Based Least Angle Regression for Influential Node Tracing in Social Network

With the epidemic growth of Online Social Networks (OSNs), a large scale research on information dissemination in OSNs has been made an appearance in contemporary years. One of the essential researches is Influence Maximization. Most research adopts community structure, greedy stage, and centrality measures, to identify the influence node set. However, the time consumed in analyzing the influence node set for edge server placement, service migration and service recommendation is ignored in terms of propagation delay. Considering the above analysis, we concentrate on the issue of time-sensitive influence maximization and maximize the targeted influence spread. To solve the problem, we propose a method called, Trilateral Spearman Katz Centrality-based Least Angle Regression for influential node tracing in social network. Besides, two algorithms are used in our work to find the influential node in social network with maximum influence spread and minimal time, namely Trilateral Statistical Node Extraction algorithm and Katz Centrality Least Angle Influence Node Tracing algorithm, respectively. Extensive experiments on The Telecom dataset demonstrate the efficiency and influence performance of the proposed algorithms on evaluation metrics, namely, sensitivity, specificity, accuracy, time and influence spread.


Introduction
With the swift evolution of network technology and familiarization of the Internet, social applications, to name a few, like WeChat, Weibo and Snapchat, moderately produce an enormous amount of network data. Several practical applications required to mine profitable information from the profitable network data. Since then, the investigation of Influence Maximization (IM) has become an intense topic of concern in social network, specifically in the recent few years due to its crucial part in an extensive span of fields, like, network monitoring, product/brand recommendation and so on.
Community Based Influence Maximization Algorithm (CBIMA) was proposed in [1] to solve the issues concerning influence maximization in social networks. With this objective, an influence maximization method was designed that in turn obtained influential nodes by means of community structure followed by which the influence distribution difference was presented. Here, to start with, network embedding-based community detection model was designed with which the entire social network was split into distinct high-quality communities.
Next, the solution for influence maximization was divided into candidate stage and the greedy stage. Here, in candidate stage, the candidate nodes were selected via a heuristic algorithm, and as far as the greedy stage was concerned seed nodes were selected via modular property-based Greedy algorithm. With this, better performance was said to be arrived at with minimum running time.
Community Finding Influential Node (CFIN) was designed in [2] was designed with the purpose of selecting optimal users on the basis of the community structure, which in turn maximized the influence spread in the networks. The CFIN was split into two stages. They were, selection of seed and community spreading in a localized manner.
In the first stage, seed nodes were extracted via community detection algorithm. Here, with the objective of reducing the computational complexity involved in selecting seed node, meaningful communities were obtained in a significant manner. Next, influence spread inside communities was determined with which the final seed nodes responsible for distribution was analyzed. With this both the coverage ratio and running time were said to be improved with better coverage ratios.
Despite improvements observed in the coverage ratio with minimum running time and higher coverage ratio for obtaining influential nodes, the accuracy and the sensitivity rate with higher influence spread was less focused. To address these issues in our work, Trilateral Spearman Katz Centrality-based Least Angle Regression (TSKC-LAR) for influential node tracing in social network is proposed.

Motivation
Recently, some researches are made to discover influential node and offer user ranking based on number of followers the user has. Both social network structure and strength of influence among individuals evolve continuously. The existing Influential node tracing techniques is finding influential nodes for only one static social network. But real-world social networks are dynamic. Therefore, accurate and precise tracking of the influential nodes is required under a dynamic setting. In order to address these issues, Trilateral Spearman Katz Centrality-based Least Angle Regression (TSKC-LAR) is proposed with the different processes to perform influential node tracing in social network.

Objectives
The main contributions of this paper are summarized as follows: • To achieve influential node tracing in social network, a Trilateral Spearman Katz Centrality-based Least Angle Regression (TSKC-LAR) is proposed. TSKC-LAR is proposed with the process of Node Cover Preprocessor model, Trilateral Statistical Node Extraction, Spearman Correlated Normalized Feature Selection, Katz Centrality Least Angle Influence Node Tracing to maximize the accuracy and influence spread under a dynamic social network. • To reduce the tracing time of influence maximization node, Node Cover Preprocessor model is employed to obtain the effective nodes and eliminate the ineffective nodes. Then, feature extraction or node extraction is presented by employing Trilateral Statistical Node Extraction model to obtain the optimal features or nodes. • To identify the influential nodes, a Spearman Correlated Normalized Feature Selection model is applied. At last, Katz Centrality Least Angle Regression is utilized to the selected nodes for significant influential node tracing with higher rate of sensitivity and specificity. • We evaluate the performance on large-scale telecom dataset. The experiment results confirm our theoretical findings and show that our proposed Trilateral Spearman Katz Centrality-based Least Angle Regression (TSKC-LAR) method achieve better performance of both influence coverage and running time along with sensitivity, specificity and accuracy.

Organization of Paper
The manuscript continues as follows. Section 2 presents a considerable reference list for related work in the field. The results of this exploration account for the novelty of the multi-objective proposal of this work. Section 3 describes the Trilateral Spearman Katz Centrality-based Least Angle Regression (TSKC-LAR) method. Section 4 provides the experimental setup, while Sect. 5 presents and discusses the main results obtained. Finally, conclusions are included in Sect. 6.

Related Works
Identifying the most influential nodes in social networks has extensive range of applications. Different types of methods have been in the recent years designed with the purpose of estimating the influential nodes on the basis of their structural location in social network. An algorithmic approach based on Map Reduce with centrality as a measure was proposed in [3] with the purpose of improving the computational efficiency. However, with natural life evolution and influence made by peer groups have resulted in temporal patterns. To address this issue, a temporal preference model was designed in [4] to detect network structure changes on the basis of the node centrality. With this, strong correlations were said to be established. Yet another influence ranking model based on Susceptible Infected Recovered was proposed in [5] to find influential nodes. From the information transmission aspect, different types of social networks have different information transmission modes due to the function diversity and user structures. Due to this, social network has become a new path for viral marketing that is utilized in promoting products, innovations and opinions. An enhanced multi attribute k shell method was designed in [6] utilizing the iterative information in the decomposition process for ranking influential nodes.
Yet another deep influential evaluation model was proposed in [7] by considering complicated graphic structure. This in turn evaluated influence node in computationally efficient manner. However, with the inception of complex networks, this type of influence node 1 3 identification remains a major issue to be attended. A machine learning framework was designed in [8] to evaluate the complicated relationship between neighboring and non-neighboring nodes.
In the analysis of influence maximization in social networks, the speed with which the information is spread reduces with two different factors, i.e., elevated time and distance. Therefore, the examination of the transmission of information is of great importance to the supervision and administration of public point of view.
An Influence Maximization algorithm was designed in [9] on the basis of the rate attenuation propagation model. With this, both the accuracy and time were said to be improved. Two new effective algorithms based on centrality measure and local measure were proposed in [10] under the Independence Cascade (IC) and Linear Threshold (LT) models. With this, the time complexity involved in influential node analysis was found to be reduced.
Conventional types of methods, like, centrality based and machine learning, only took into considerations either the structures involved in the network or the features involved in the design to measure the node significance. However, the influential significance has to be determined by both considering the network structures and node features. To solve this issue, a deep learning model was presented in [11] to identify the most influential nodes in a complex network. Graph classification method was applied in [12] for identifying significant nodes.
In recent few years studies conducted on the influential community have identified communities that involve large types of influential members. Several types of metrics are hence involved in influencing and existing techniques however search for influential communities by taking into account only one influence type. In [13], an efficient influential community search method that identified the influential communities based on multiple influence criteria. Yet another deep reinforcement learning was applied in [7] for influence maximization. In the meanwhile, investigations made on prior analysis have specifically designed on the basis of the static network topology. However, the user's online/offline status and topic preference made it a cumbersome process for conventional methods to be applied in real scenarios. Based on the above analysis, time-sensitive influence maximization was concentrated on [14] and accordingly influence maximization was made. An interchange greedy approach was applied in [15] to maximize joint influence under single network. With this better performance was said to be achieved in both influence coverage and running time.
To improve both influence spread and time efficiency, a Multipath Asynchronous Threshold (MAT) model was designed in [16] by employing both neighboring and non-neighboring influencers into account. A network representation learning model was proposed in [17] by using diffusion based processes, therefore identifying influential nodes. A new optimization model using independent cascade and linear threshold was designed in [18] to maximize influence spread. Based on the above research and analysis, the proposed method constructs a four step processes to perform influential tracing in social network. That is to say, considering the normalized spearman correlation and Katz Centrality Least Angle, network structure information for influential node identification is maximized with minimum response time.

Proposed Work
This section explains the design and development of the TSKC-LAR for influential node tracing in social network. Here, four different steps are effectively combined to design an algorithm for influential nodes selection, covering service migration and service recommendation. Figure 1 given below shows the block diagram of Trilateral Spearman Katz Centralitybased Least Angle Regression (TSKC-LAR) for influential node tracing in social network. As illustrated in the below figure, to identify the influential nodes in social network, four different processes are followed. They are preprocessing, feature extraction, feature selection and influential node tracing. First, with the Telecom dataset provided as input, ineffective nodes are removed by means of Node Cover Preprocessor model. Second, feature extraction or node extraction is performed by employing Trilateral Statistical Node Extraction model where only the optimal features or nodes are extracted. Next, feature selection of normalized nodes is used in identifying the influential nodes are selected by applying Spearman Correlated Normalized Feature Selection model. Finally, Katz Centrality Least Angle Regression is applied to the selected nodes for significant influential node tracing. The elaborate description of the proposed method is given below.

Node Cover Preprocessor Model
With the evolution of social networks and extensive utilization between social network users, the magnitude of the social network is surging day by day. So it makes certain issues in identifying the most influential nodes like, memory deficiency, slowdowns of processing and so on. In this work, to address this issue, a Node Cover Preprocessing (NCP) model is designed that provides a pre-processing sample prior to the execution of the indigenous influential node selection algorithm. This NPC model in turn minimizes the memory and therefore enhancing the process of detecting influential nodes and accelerating overall processes. The proposed method uses the NCP as a pre-processing. Figure 2 given below shows the block diagram of Node Cover Preprocessing (NCP) model.
As shown in the above figure, let us consider the social network modeled as a directed network ' G = (V, E) ', where ' V ' represents the individual nodes (i.e., 600 nodes) or social network users while ' E ' represents the links between the individual nodes or link between the social network users. In addition, each edge ' (a, b ∈ E) ' is related with a propagation probability ' Prop G a,b ' specifying the robustness of effect of individual node or social network user ' a ' on ' b ' with two different conditions, effective or potential ' C a,b ∈ E → Eff∕Pot'.
On one hand the effective edge ' E Eff ' refers to the relationship between online users, and on the other hand, the potential edge ' E Pot ' refers to that at least one user on both sides of the edge is offline. Due to the nature of users' online status and time constraints of information dissemination, network structure is distinct.
For two information ' In i ' and ' In j ', to be transmitted at different time intervals ' T i ' and ' T j ', then ' T i ' denotes the effective time for information ' In i ' obtained using the deadline of information ' t In i end ' and start time of information ' t In j start ' as given below.
Next, propagation probability is derived due to the impact of users on neighbors has considerable variability, the propagation probability decreases with increase in distance ' d '. This is mathematically expressed as given below.
Finally, the propagation probability ' Prop ' for a directed network graph ' G ' of two different social network users ' a ' and ' b ' is mathematically expressed as given below. (1)

Fig. 2
Block diagram of node cover preprocessor model From the above Eq. (3), the propagation probability for removing ineffective nodes in the network is obtained by integrating the gradual loss ' μ ' and the information to be propagated between two nodes or users ' a, b T i ' respectively. The pseudo code representation of Node Cover Preprocessor model is given below.

Algorithm 1 Explanation
As given in the above Node Cover Preprocessor model, three different measures are evaluated with the objective of eliminating the ineffective nodes. First, the effective time is measured with the time obtained, the gradual loss based on the distance is evaluated. Finally, propagation probability is measured to obtain the effective nodes and eliminate the ineffective nodes.

Trilateral Statistical Node Extraction
In this section, with the elimination of ineffective nodes and selection of effective nodes, next node extraction is done by applying trilateral statistical function. Here, trilateral statistical function refers to the application of three different hypotheses and extracting the nodes based on these three hypotheses. For the purpose of influential node tracing in social network, a node in our work is defined as a set of points with three hypotheses that the number of points ' P = P 1 , P 2 , … , P n ' must be above the threshold ' T ' and must be within a predefined transmission range ' TR '. Figure 3 given below shows the block diagram of Trilateral Statistical Node Extraction model.
As shown in the above figure, the preprocessed nodes (i.e., 500 nodes) provided as input, three hypotheses are applied in our work with the aid of statistical functions using the base station ' BS i ' obtained via its location and the user ID ' UID i ' to extract nodes for identifying influential node in social network. The first hypothesis is expressed as given below.
From the above Eq. (4), the base station ' BS i ' is recalculated by taking the average of its members (via ' UID ') ' sumof(UID i ) N '. The second hypothesis is expressed as given below.
From the above Eq. (5), the base station ' BS i ' saves the current user ' UID i '.
From the above Eq. (6), the base station ' BS i ' is displaced with the current point ' P i ' by eliminating the current user ' Clear UID i '. In this way, by employing the location and significance of each node with respect to others, the most vital nodes required for identifying the influential node are extracted. The pseudo code representation of Trilateral Statistical Node Extraction is given below.

Algorithm 2 Explanation
As given in the above Trilateral Statistical Node Extraction algorithm, if the distance ' Dis ' is smaller than the threshold distance ' DisT ', then this means that the point ' P i ' is detected inside the transmission range ' TR ' and hence the point index ' i ' is added to prevailing user IDs ' UID i ' and the base station ' BS ' with this corresponding location via latitude and longitude is recalculated by taking the mean location of its members (i.e., Users via ' UID'). If the distance ' Dis ' is larger than the distance threshold ' DisT ' and the number ' N ' of its members (i.e., Users via ' UID ') is larger than the threshold ' UIDT ', then the point ' P i ' is detected outside the transmission range ' TR ' and the user points already stored in ' UID i ', therefore the node index ' i ' is incremented by 1 to save the current user. Finally, the base station, ' BS i ' is displaced with current point ' P i ' and the ' UID i ' is cleared and then reinitiated with the index ' i'.

Spearman Correlated Normalized Feature Selection
In this section, with the extracted nodes, features are selected by integrating spearman correlation and a normalization function. Figure 4 shows the block diagram of Spearman Correlated Normalized Feature Selection model.
As shown in the above figure, the green circle is the transmission range or the boundary of the node (i.e. user ID), the yellow circle is the node centre or the base station location (i.e., the base station) and the small points are locations of the users. For a sample of size ' n ', with ' n ' representing the extracted features or nodes extracted ' NE ', the ' n ' raw scores ' NE i ', ' NE j ' are transformed to ranks ' r NE i ', ' r NE j ' and ' r s ' is evaluated as given below.
From the above Eq. (7), the rank ' r s ' is measured based on the pearson correlation coefficient ' ρ ', covariance of rank of each extracted nodes ' cov r NE i , r NE j ', and the standard deviation of each extracted nodes ' σr NE i , σr NE j ' respectively. With distinct integer formulation, the ranks to measure correlation between extracted nodes for selecting normalized nodes are expressed as given below.
From the above Eq. (8), ' r NE i − r NE j ' refers to the difference between the ranks of two correlated extracted nodes ' NE i ' and ' NE j ' with '' representing the total number of observations. This normalization is necessary because of the reason that as we advance through the algorithm, succeeding feature may explain less of the outcome of selected nodes ' NS ' and so change in ' 1 − rs Prob BS * Prob NE j ' by splitting ' NE j ' are smaller. The consequence of this would be a coarser selection of base station, resulting in premature convergence and so the normalization ensures that this is not the case the feature is selected consistently by avoiding premature convergence at each stage of the algorithm. The pseudo code representation of Spearman Correlated Normalized Feature Selection is given below.

Algorithm 3 Explanation
As given in the above Spearman Correlated Normalized Feature Selection algorithm, the objective here remains in improving the accuracy involved in tracing the influential nodes in social network. This is achieved by applying two different functions, spearman correlation and normalization. First, by applying the spearman correlation the significance of influential node is shown and allows for further analysis of obtaining the influential node. Next, by employing normalization, premature convergence is avoided and hence has the probability of analyzing all the base stations for obtaining the influential node.

Katz Centrality Least Angle Influence Node Tracing
Finally, with the selected features or nodes, the actual influence node tracing in social network is performed by applying Katz Centrality to the Least Angle Regression model. Let ' NS = NS 1 , NS 2 , … , NS p ' be the predictors and ' Y ' be the response and let ' NS = NS 1 , NS 2 , … , NS p ' represent the matrix with columns denoting the predictors. Let us also consider the regression coefficient as ' β = β 1 , β 2 , … , β p � ' provide us with the estimator response ' Y ′ ' that is expressed as given below.
From the above Eq. (10), the estimator response ' Y ′ ', is obtained by employing the node selected ' NS j ' and the regression coefficient ' β j ' respectively. With this, Katz Centrality is applied to the estimator response for measuring the relative influence and this is expressed as given below.
From the above Eq. (11), the Katz Centrality ' C Katz for measuring influence nodes based on neighboring and non-neighboring set is obtained based on the constant ' α = 0.5'', bias constant ' β '. Moreover, ' NS ji ' take value ' 1 ' if a predictor node is connected to node ' j ' and take value ' 0 ' if a predictor node is not connected to node ' j '. Next, with the aid of LAR model, ' β ' is selected by minimizing total squared error subject as expressed below.
From the above Eq. (12), the estimator response is evaluated based on the minimization of total squared error, therefore considering the neighborhood and non-neighborhood set while identifying the influential nodes in social network. The pseudo code representation of Katz Centrality Least Angle Influence Node Tracing is given below.

Algorithm 4 Explanation
As given in the above Katz Centrality Least Angle Influence Node Tracing algorithm, the objective remains in retrieving the influential node in social network by minimizing the total squared error subject to constraint while selecting toe predictors. This is achieved by first considering regression coefficient to accurately derive the relationship between the predictor variable (i.e., nodes selected) and the response (i.e., the influential node). This is obtained by second using a centrality factor considering both the neighboring and nonneighboring node while tracing the influential node.

Experimental Settings
Experimental evaluation of proposed Trilateral Spearman Katz Centrality-based Least Angle Regression (TSKC-LAR) method and existing methods Community Based Influence Maximization Algorithm (CBIMA) [1] and Community based algorithm for Finding Influential Nodes (CFIN) in complex social networks [2] are implemented in Java. The experimental evaluation is performed with the Telecom Dataset http:// sguan gwang. com/ Telec omDat aset. html [19][20][21][22]. The dataset, provided by Shanghai Telecom consists of greater than 7.2 million records of effective accessing of the Internet via 3,233 base stations from 9,481 mobile phones acquired for a period of six months. This dataset is very commonly used for influential node selection serving topics such as edge server placement, service migration, service recommendation, etc. For the experimental consideration, six parameters, such as Month, Data, Start Time, End Time, Base Station Location, Mobile Phone ID are taken as input for influential node tracing in social network. The performances of proposed and existing methods are evaluated using different metrics such as sensitivity, specificity, response time, accuracy and influence spread with a number of nodes.

Qualitative Analysis
In this section the qualitative analysis with a sample of 10 nodes in social network is analyzed. First, Node Cover Preprocessing (NCP) model is applied to the sample nodes then the information to be transmitted at different time intervals is estimated as given below in Table 1 using the deadline of information and start time of information. Followed by which the propagation probability is evaluated based on the distance ' d'.
Based on the propagation probability from (3), the effective nodes selected are ' T 1 ', ' T 2 ', ' T 4 ', ' T 6 ', ' T 9 ' and ' T 10 ' respectively. With the obtained effective nodes, most vital nodes required for identifying the influential node have to be extracted. This is performed by means of distance threshold ' DisT ' and user threshold ' UIDT '. Let the distance threshold be assumed to be ' DisT = 4 ' and user threshold to be assumed to be ' UIDT = 3 ' (Table 2).
Then from the first hypothesis, the base station ' BS i ' is recalculated by taking the average of its members, i.e., given as below.
Then, extracted features or nodes extracted ' NE ' is transformed to rank ' r NE i ', ' r NE j ' and {\ displaystyle X_{i},Y_{i}} ' r s ' using (7) and is given below.
Covariance of cov r NE i , r NE j = 5000 The rank ' r s ' is measured based on the pearson correlation coefficient ' ρ ', covariance of rank of each extracted nodes ' cov r NE i , r NE j ', and the standard deviation of each extracted nodes ' σr NE i , σr NE j ' as given below Normalization for ' NE j ' is evaluated as given below.
Normalization for ' NE i ' is evaluated as given below.  So the features selected are from ' NE i = T 4 , T 6 ' (Table 4) Finally, the influential node is estimated based on the Katz Centrality using (11) and (12). Let us assume the regression coefficient ' β j ' to be between ' {0.7to 1.0} ' and constant ' α = 0.5 ', then, the estimator response for node ID ' T 4 ' is given as below.
In a similar manner, the estimator response for node ID ' T 6 ' is given as below.
Therefore, the influential node is ' T 6 '.

Case 1: Sensitivity
In influence node tracing sensitivity is a measure of how well a test can identify true positives. In other words, the sensitivity is its potentiality to determine the influential node tracing correctly. To estimate it, we should calculate the ratio of true positive in influential node tracing cases. Mathematically, this can be stated as given below.
From the above Eq. (13), the sensitivity rate ' Sensitivity ' is measured based on the true positive ' TP ' (i.e., the number of nodes correctly identified as influential node if so) and the false negative ' FN ' (i.e., the number of nodes incorrectly identified as normal node) respectively. The sensitivity of a test can assist in showing how well it can classify samples (i.e. nodes) that have the condition. In other words, a high sensitivity value means a test correctly classifies a sample. Table 5 given below shows the sensitivity rate of the proposed TSKC-LAR and existing methods, CBIMA [1] and CFIN [2].   Figure 5 given above shows the sensitivity measure for tracing the influential node in social network. A simulation of 5000 nodes with unique user ID (i.e., mobile phone) is considered and the sensitivity rate is measured accordingly. With ' 500 ' nodes considered for simulation, ' 42 ' number of nodes correctly identified as influential node and ' 10 ' number of nodes incorrectly identified as normal node using TSKC-LAR, ' 40 ' number of nodes correctly identified as influential node and ' 15 ' number of nodes incorrectly identified as normal node using [1] and ' 38 ' number of nodes correctly identified as influential node and ' 20 ' number of nodes incorrectly identified as normal node using [2], the sensitivity rate was observed to be 80.76%, 72.72% and 65.5% respectively. From the simulation results it is inferred that the sensitivity rate though decreases with the increase in the number of nodes, the sensitivity rate using TSKC-LAR is found to be better than [1] and [2]. This improvement is due to the application of Node Cover Preprocessor algorithm. By applying this algorithm, initially the effective time is measured and then followed by which the gradual loss based on the distance is evaluated. Finally, propagation probability is estimated to arrive at the effective nodes and discard the ineffective nodes. With this the sensitivity rate using TSKC-LAR method is found to be improved by 10% compared to [1] and 22% compared to [2].

Case 2: Specificity
In influential node tracing, specificity is a measure of how well a test can identify true negatives. It refers to the percentage of the true negatives out of all the samples that do not have the condition (true negatives and false positives). Mathematically, this can be stated as given below.
From the above Eq. (14), the specificity rate ' Specificity ' is measured based on the true negative ' TN ' (i.e., number of nodes correctly identified as normal node) and the false positive ' FP ' (i.e., the number of nodes incorrectly identified as influential node). A test with a high specificity value means that it is correctly classifies the nodes with the condition more often than a test with a lower specificity. Table 6 given below shows the specificity rate of the proposed TSKC-LAR and existing methods, CBIMA [1] and CFIN [2] (Table 6). Figure 6 given above illustrates the specificity rate for different numbers of nodes for accessing the internet through base station. With this specificity rate evaluation the service migration and service recommendation can be made in an efficient manner. With ' 500 ' nodes considered for simulation, ' 380 ' number of nodes correctly identified as normal node and ' 15 ' number of nodes incorrectly identified as influential node using TSKC-LAR and existing methods, ' 360 ' number of nodes correctly identified as normal node and ' 20 ' number of nodes incorrectly identified as influential node using CBIMA [1] and ' 340 ' number of nodes correctly identified as normal node and ' 35 ' number of nodes incorrectly identified as influential nodeusing CFIN [2], the overall specificity was observed to be 96.2%, 94.73% and 90.66% respectively. From the results it is inferred that the specificity rate using TSKC-LAR is better than [1] and [2]. The specificity rate improvement is due to the application of propagation probability that evaluates the distance on the basis of the (14) Specificity = TN TN + FP neighborhood factor therefore assisting in removing the ineffective nodes. With this the specificity of TSKC-LAR method is said to be improved by 4% compared to [1] and 11% compared to [2].

Case 3: Accuracy
The third parameter used for analysis is the rate of accuracy. The accuracy of a test is its potentiality to distinguish the normal and influential nodes correctly. To evaluate the accuracy of a test, the ratio of influential nodes correctly predicted as it is and the nodes for evaluation has to be considered. Mathematically, this can be stated as: From the above Eq. (15), the accuracy rate ' Acc ' is evaluated based on the sample nodes involved in influential node tracing ' N i ' and the nodes correctly traced ' N CR '. It is expressed in terms of percentage (%). Table 7 given below shows the accuracy rate of the proposed TSKC-LAR and existing methods, CBIMA [1] and CFIN [2]. Figure 7 given above shows the accuracy involved in tracing the influential nodes in social network based on the trajectory of users. From the figure it is inferred that the accuracy rate is said to be inversely proportional to the nodes considered for simulation. In other words, increasing the nodes causes an increase in the overall nodes involved in influential node tracing and therefore reducing the accuracy rate. However, experimental simulation shows betterment using TSKC-LAR when compared to CBIMA [1] and CFIN [2]. With ' 500 ' nodes considered for simulation and ' 480 ' nodes correctly traced using TSKC-LAR, ' 465 ' nodes correctly traced using [1] and ' 450 ' nodes correctly traced using [2], the overall accuracy was found to be 96%, 93% and 90% respectively. The reason behind the improvement in the accuracy rate is due to the application of Trilateral Statistical Node Extraction algorithm. By applying this algorithm, optimal nodes for influential node tracing is said to be extracted. Here, the point detected inside the transmission range along with the corresponding location via latitude and longitude is taken into account for obtaining optimal nodes. With this, the accuracy using TSKC-LAR is said to be improved by 5% compared to [1] and 9% compared to [2].

Case 4: Response Time
The response time in our work refers to the time consumed in identifying influential node in social network. Mathematically, this can be stated as: From the above Eq. (16), the response time ' RT ' is measured based on the sample nodes involved in simulation ' N i ' and the time consumed in node tracing ' Time NTracing '. It is measured in terms of milliseconds (ms). Table 8 given below shows the response time of the proposed TSKC-LAR and existing methods, CBIMA [1] and CFIN [2]. Figure 8 given above illustrates the response time involved in tracing the influential nodes in social network to the distinct user ID obtained at different time intervals. However, for fair comparison same numbers of nodes and user IDs are utilized in all the three methods for measuring the response time. From the figure the response time is directly  proportional to the nodes, i.e., increasing the numbers of nodes causes increase in the node deployment and consequently a linear rise is said to be observed in the response time. However, with ' 500 ' nodes considered for experimentation and the time consumed in node trace with single node involved in the network framework being ' 0.185ms ' using TSKC-LAR, the time consumed in node trace with single node involved in the network framework being ' 0.215ms ' using [1] and ' 0.230ms ' using [2]. From these results it is inferred that the response time is comparatively lesser using TSKC-LAR than [1] and [2]. The reason behind the improvement is due to the extraction of optimal nodes for influential node tracing via Trilateral Statistical function. By applying this function, three hypotheses are checked by using the location and node significance with respect to others, the most vital nodes required for identifying the influential node are extracted. With these nodes, the influential nodes are identified in minimum time using TSKC-LAR by 12% compared to [1] and 17% compared to [2].

Case 5: Influence Spread
Finally, the influence spread over the whole network is analyzed in this section. Influence maximization refers to the problem of identifying a small subset of nodes (seed nodes) in social network that maximize the spread of influence. Table 9 given below shows the influence spread using three methods, TSKC-LAR, CBIMA [1] and CFIN [2]. Figure 9 given above shows the influence spread for different seed set size in the range of 10 to 50. From the figure it is inferred that the influence spread is maximized in all the three methods and comparatively found better using TSKC-LAR method than CBIMA [1] and CFIN [2]. The influence spread maximization was achieved using TSKC-LAR method due to the application of Katz Centrality Least Angle Influence Node Tracing algorithm. By applying this algorithm, while retrieving the influential node in social network, the total squared error was minimized while selecting the predictors. Also, relationship between the predictor variable (i.e., nodes selected) and the response (i.e., the influential node) were estimated on the basis of regression factor. Finally, both the neighboring and non-neighboring node was involved while tracing the influential node based on the centrality factor. This in turn improved the influence spread using TSKC-LAR method by 26% compared to [1] and 54% compared to [2]. Table 10 illustrates the results comparison of TSKC-LAR method and existing [1,2]. From the above table, the average value of proposed and existing methods is provided for five different metrics. The proposed TSKC-LAR method provides the Sensitivity of 78% and the sensitivity of existing methods are 71% and 64%. The proposed TSKC-LAR method provides the Specificity of 94% and Specificity of existing methods are 90% and 85%. The response time of proposed TSKC-LAR method is 189 ms and the existing methods is 215 ms and 229 ms. The accuracy of proposed TSKC-LAR method is 94% and the existing methods is 90% and 87%. The influence spread of proposed TSKC-LAR is 1430 and the existing methods is 1130 and 930. From the above result analysis, it is evident that, the proposed TSKC-LAR method provides better performance in terms of Sensitivity, Specificity, Accuracy, Response Time and Influence spread than the existing methods.

Conclusion
An efficient Trilateral Spearman Katz Centrality-based Least Angle Regression (TSKC-LAR) for influential node tracing in social network is developed with objective of increasing the influence spread. The key objective of TSKC-LAR method is to ensure influence maximization and minimize response time during influence tracking. The objective of TSKC-LAR method is attained with application of Trilateral Statistical Node Extraction and Katz Centrality Least Angle Influence Node Tracing algorithm. First effective nodes were identified by means of Node Cover Preprocessor. Next, by employing Trilateral  Statistical Node Extraction algorithm optimal nodes were extracted via three hypotheses. Moreover, to avoid premature convergence, normalized node selection was performed by applying normalization and spearman correlation. Finally, influential node in social network was identified by minimizing the total squared error. The efficiency of TSKC-LAR method is estimated in terms of sensitivity, specificity, accuracy, influence spread and response time compared with state-of-the-art works. The simulation results shows that the TSKC-LAR method presents better performance with an enhancement of influence spread and minimization of response time when compared to the state-of-the-art works.

Declarations
Conflicts of interest None.