Quantifying Information Dissemination Rate during Crisis and Location Detection Using Online Social Streams

: The widespread practice of Online Social Networking leads to the diffusion of trending information and exchanging various opinions with socially connected people online. Social media steams data extracted from Social Networks has become a vital communication tool and also turn up as an eventual informative platform to catch real human voices at the time of emergency events like disaster. An effective underlying quantification model is proposed in this paper which uses change point detection algorithm to detect events based on the relative streaming tweet density - ratio respectively. A morphological time-series analysis is carried out determine the dissemination of information about crisis events using Information Entropy. Further, the Event - Link ratio (ELR) is estimated to obtain meaningful patterns in events been identified. This paper focus to empirically quantify the information dissemination of the events based on user's tweeting activities. The proposed quantification method is compared with state-of-art techniques in terms of event detection rate, the entropy of information spread. It is found that the accuracy of the proposed method is up to 94% with event detection after 75 seconds. K-Center Clustering (KCC) is used which results in the location detection accuracy of 85%.


Introduction
In latest era of online technology, online social networking data is encapsulated with a variety of huge informative blocks about various real-world events and publics' opinion, which is insightful during disasters to provide public safety. Recently, social media has been successfully utilized to be a major replacement to measure the impacts of disaster events in online stream real time. It provides many userfriendly services with user-generated content towards the overwhelming amount of information on hand.
Moreover, the events categorize across diverse temporal and dynamic spatial scales with respect to geographical information science. In specific, social networking sites namely Twitter, Weibo and Facebook, used as a vital social sensor of disasters like an earthquake, flood, landslides, etc to provide immediate response and recovery [1]. In addition, geographically located social streaming data is accepted to be a trustworthy objective for sensing disasters via online and examining reactive action after the mass emergency events [2]. On the other hand, Online Social media mining is widely used in typical disaster scenarios; one of the most vital aspects to identify with social responses is to measure people's opinion for improved disaster support model. In particular, the users associate their impulsive reaction during disasters in terms of Social Media Timeline (SMT) , news feeds where the user-post (PT), tweet (TW), reply (RE), retweet (RT), share (S), and mentions (MT) along with images and videos to yield the attention of others to broadcast the sources of disaster information. However, the absolute volume of social information streams produces huge commotion that ought to be sifted. By recognizing patterns in the surge of messages and data stream a change point can be identified in the typical progression of streaming tweets [3]. Disaster events can be perceived as spikes in action, simultaneously as importance can be interpreted over the span of changes in content.
It become very difficult to acquire inside and out hints of data spread during events such as people's real social connections during disasters, their implicit behavioral profiles, and situation roles in social related activities. The extent of information spreading process depends on the core heterogeneous Social Networks and basic behavioral profiles of the individual user [4]. The user behavior activity is measured as "temporal series" with respect to data. The "popular" Twitter users are focused in this paper, who retweet activities are considered [5]. A new approach is proposed that incorporates the following information to understand user behavior during critical times. Tweets and retweets of the targeted users who eventually follow temporal patterns are monitored. The web-based online social media information contains more data about continuous occasions, yet additionally delicate issues that stay undetected. It goes to be exceptionally muddled for this situation to recognize the spreading [6] data of most importance. In this paper, we basically center around the Twitter information streams and think about the accompanying target. The primary objective of this paper is to plan a novel quantifying model to identify occasions identified with disaster that (i) To identify event detection rate which is observed at multidimensional scales, specifically, events that take place in diverse location and temporal timelines by computing Event Link Ratio(ELR), (ii) are influential beside the uncertain and unfiltered insights extant in the data between dynamic time-interval using change-point-detection algorithm, (iii)an information theoretic approach to classifying Twitter users entropy based on homogeneity in tweeting activities including user sentiment polarity, and (iv)providing a novelty evaluation method to identify event involved in intervals using the Z-score and Local-to-global Ratio. This paper is organized as follows. First, the related state-of-art work of various quantification models related to disaster events is discussed in Section 2. The proposed work is discussed in Section 3.
The experimental results and research findings are discussed in Section 4. The event detection rate results are given in Section 5. The location detection is discussed in Section 6. The conclusion and future work is deliberated in Section 7.

Related Work
Several approaches to quantify social media data to form support model for predictive analysis have been proposed over time. The tweeting dynamic activities on Twitter are made to understand the structural properties of information flow during disasters. The insight study is done on Twitter data during the Tohoku earthquake [7] in 2011. An automatic technique to find relevant corpus for tracking disasters was investigated thoroughly as an early warning system. The paper identified how quickly Japanese people's concern returned to a stable level after the disasters. Twitter user's tweeting, answering, and retweeting exercises, distinguished a technique to separate Twitter clients [8] based on their exercises. The paper broke down all likelihood to programmed client grouping and sifting dependent on requirements. The test results with information from Twitter when the Japan Earthquake, their proposed strategy could characterize clients relying upon their characteristics with a precision of the examinations and with high exactness contrasting and the old techniques.
Tweets having URLs are analyzed by their combined `retweeting' dynamics [9]. The paper achieved a separation of different activities using two features to categorize content based on the user response it generates. Among them, the spreading processes of specific pieces of information, including studies on the corpus sequence and viral diffusion behaviors, are most related to our work. It is distinguished, numerous classes of retweeting action on Twitter: bots action, newsworthy data spread, publicizing and advancement, political, crusades, and promotions advertisement.
Sentiment characterization of user posts gathered from Twitter during the Hurricane Sandy [10]. The paper pictured the estimations on a geological guide which is fixated on the typhoon calamity occasion. It center around removed data and the handiness of associated emergency guides to keep up the crisis reaction. A strategy implied for influenced populace's reaction to a calamity can be estimated over the span of a feeling examination [11][12][13] and afterward planned corresponding to the fiasco in existence.
Spatiotemporal factual examination is done on boisterous data web-based media information. Besides, different conventional ways to deal with distinguish occasions for fixed transient and spatial terminations, while truly occasions of various scales regularly happen simultaneously was proposed. A multi-scale occasion recognition [14] to process an information similitude chart at suitable scales and distinguish occasions of various scales by a novel diagram based grouping technique.
Quantitative research perceptions and strategies for evaluation of mass streams of data lead to ideal scaling for occasions dissected during catastrophes in Japan. The first and second Hayashi techniques [15]are applied to the situation where an outer standard is available and are utilized to anticipate the impacts of variables considered. The Hayashi work used to develop a spatial design to acquire the common relationship of the information for comparable clients [16] and occasions. The word order characteristics quantified used Hayashi's quantification method type III (HQM). It was examined using the first and second component of HQM and MDS results. The natural language processing using WordNet [17] lexicons for Twitter datasets are studied.
The political leaning inference [18] was framed to maximize tweet-retweet covenant with average mean error and user match technique with regularization duration. The convex optimization problem is solved by for Romney and Obama-bashing tweets circulated by the networking sites online during the process of election events. The three-class classification [19,20] problems is modified as two binary classification problems using the Senti-Strength algorithm [21]. In their experiments, the polarity of tweets are classified as positive, negative and neutral using machine-learning classifiers trained on bi-grams , tri-grams and lexicons based features, and their combination [22]. An entropy-based metric is reported to represent sentiment limited to social media data. Various events are detected and visualized using Twitter micro blogs during certain natural hazards events [23], Crisis Mapping [24], Emergency situation awareness from twitter [25], Predict Disasters on Twitter Data [26], entropy based event detection [27], Real-time event detection for online behavioral analysis [28], [29], Twitter-based traffic event detection [30], Emerging topic detection in twitter stream [31] was proposed to provide situational awareness through social media. Various papers [32]- [34] discussed about quantifying event information spread using online social networks. Eagle & Pentland introduced a system for sensing complex social systems with data collected from 100 mobile phones. Bluetooth-enabled mobile telephones were used to measure information disseminated from different context through Shannon entropy. Moreover, the social patterns were recognized in daily user activity, infer relationships and identify socially significant locations [35]. Shetty & Adibi proposed entropy models to study information flow in an organization on keyword graphs are relevant or not. The results review with two different experiments which are based on entropy models [36]. The Entropy model identifies the most interesting and important nodes in a keyword graph which is partially adopted in the proposed work.

Identifying Change Point Detection
Event burst distribution can be identified by observing the Twitter continuously using RuLSIFchange point detection. Moreover it proceeds with the method of setting up an instantaneous alert for immediate attention as soon as a real-time unexpected event is detected. It can be measured based on density of similar tweets that bursts out in a specified time interval t and t. In particular, the overall frequency of words wi tweeted in time-interval is directly proportional to the burst of an event Ej for the interval. The density of tweet exceeds certain peak period of threshold and attain a saturation point can be identified as change-point. The mathematical formulation for relative tweet density-ratio estimator is mentioned as f (Y), the α -relative Pearson(PE-divergence) can be approximated using Eq.(1).
For our experiments, RuLSIF-based change point detection algorithm is used to directly estimate the relative tweet density-ratio where 0 ≤∝< 1 is a parameter.

3.1.1.Tweet Analyzing Parameters
The Document Incidence(DI), is the number of Tweets in which the word appears. The Global Frequency Rate(GFR), is the total number of times the word appears within the tweet dataset between First Interval(FI) and Last Interval(LI). Burst Ratio(BR) It can be obtained by calculating Z-Score, in which how frequently the word appears in the chosen interval relative to its average frequency across all intervals. A high z-score means that the word is unusually more frequent and therefore likely to be a good descriptive word for being novel or else rare topics discussed within the interval Novelty measures the percentage value that represents the degree to which tweets across surrounding intervals discuss a novel topic. The novelty of 0% indicates that every term in the selected interval was the same as other intervals. Inversely, 100% indicate that every term in that interval was different than other intervals as defined by the burst term model selection. Homogenity is the percentage value that represents the degree to which tweets within that interval use the same keywords.
Burst Ratio(BR) It can be gotten by figuring Z-Score, in which how habitually the word shows up in the picked stretch comparative with its normal recurrence across all spans. A higher value of Z-Score implies that the word is uncommonly more successive and subsequently liable to be a decent engaging word for being novel or probably uncommon themes talked about inside the stretch Novelty estimates the rate esteem that addresses how much tweets across encompassing spans examine a novel subject. The Document Incidence(DI), is the quantity of Tweets where the word observed in the post. The Global Frequency Rate(GFR), is the quantity or number of times the word observed inside the tweet dataset between First Interval(FI) and Last Interval(LI). The uniqueness or the novel rate of 0% shows that each term in the chose span was equivalent to different interval period. Subsequentely, 100% demonstrate that each term in that stretch was unique in relation to different spans as characterized by the burst term model choice. Homogenity is the overall percent of data that addresses how much tweets inside that span utilize similar watchwords. ( , ) below 30% indicates every tweet has distinct content in the given interval. ( , ) between 31% to 70% indicates every tweet has similar content in the given interval.
0% − 30 % , tweet has distinct content 31% − 70% , tweet has similar in content 71% − 100% , high retweeting activity of similar content Event Link Ratio (ELR) is the ratio amid total numbers of tweets containing URLs linked to disaster and the aggregate quantity of tweets in the given interval. The ELR ranges between 0 and 1. The less value of ELR, URLs linked with event-of-interest are minimum and maximum for highly linked events with a value reaching value one. The tweets that spread false news in large range regarding an event is identified as the false panic rate, which is classified to be bots in our experiments. A non-trivial parameter in detecting the interconnected event during a disaster is Temporal Burst Ratio (TBT). It is the ratio between Novelty and Burstiness for the time interval(t, t).
The sentiment or polarity of the tweet can widely provide the polarity of the event. The tweets are categorized as SENT+ and SENT-takes value ranges and {+1 to +5} {-1 to -5} appropriately.

3.1.2.Probability Distribution of Tweets
With the reference of the parameters mentioned above, the probability of the Twitter event burst is identified to follow a binomial distribution during disasters using Eq.4. In order to calculate the probability of the aggregated value of tweets that hold the lexicons word wk at time T(wj), can be denoted as P(nj,k), as mentioned below: where N is the tweets count in given period of time series evaluation. Although Ni is the number of tweets that vary in each time-interval ti, and it can be re-scaled in all time-interval by uniformly normalizing the frequency of words responsible for the event burst. From the above distribution, is the anticipated probability of the tweets that contain a word wk in a randomly chosen time-interval. Hence the mean of the detected probability of word wk among given intervals comprehending word wk, which are defined as where C is the count of intervals comprising word wk and , We determine whether a word wk is bursty or not by comparing the actual probability of the word wk take place in the interval T(wj) against pk of the lexicon word wk occurring in a random interval (t, t). If the calculated value of , is greater than wk(pk), then it visibly shows that word wk exhibits an anomalous behavior in given time T(wj).
Furthermore, it is evident the word wk as, a bursty word(tweet) in time T(wj).

Entropy Based Quantification-Mathematical Model
Twitter user activities can be converted into an information speculative technique by scheming where m is the constraint to decide the unit of time-interval. The time-interval entropy on topic words can be derived using following equations.
In order to measure entropy, the extent of user distribution is used to determine user entropy on topic words. Let random variable D represents a distinct user in trace Ti with all possible values{d1,d2,…dnD}. Let the number of retweets from user Ui in the trace Ti. Then pF represents the probability density function of D, such that PF(di) provides the probability of every retweeting activity taken by the online user fi, then the frequency of user a∈ U, tweet which is retweet by b to user c ∈ U is expressed by the following equation.
The frequencies of occurrence of the particular hashtag mention by various users over a continuous set of time-interval by j, Ch(j), we can calculate normalized hashtag entropy on topic words as follows.
The above equation ensures the probability calculations are normalized so that entropy is finite for hashtags. The value 0.01 is used to normalize the entropy for fuzzy crisp dataset values. In our analysis, the dataset results high-entropy on hashtags are considered to be a significant long-running phenomenon which appears with uniform frequency over time.
In addition Similar User -Entropy is calculated using the burst words involved in tweets that typically retweeted more than 100 times within the same group of users. It is important to measure the similarity between different tweets corresponding to the same event. In our baseline event detection process, we measure the similarity between every pair of tweets Ta and Tbas: Sim

Event Detection and Quantifying Models
The proposed work identifies the disaster-related event depending upon tweet lexicons and evaluates the re-tweeting activity dynamics during the disaster. The proposed work model is shown in  The change point detection using RuLSIF is calculated with relative Pearson divergence α ranges value 0.7 to 0.9. It proves the change point is detected which inclusively shows the dynamics in social streams.
The dataset collected with an unbalanced sample of 19,313 tweets. We use stratified sampling to get a balanced data-set of 10127 tweets. The total volume of tweets spiked out related to Assam flood and the proportion of tweets embedded with the URL are considered. The change point detection to identify flood in Assam, Twitter is monitored for disaster-related topic words. The tweets having sentiment score range lesser than tweet threshold(S < TWs) is filtered and considered for sentiment analysis processing.
The data sets are analyzed and the experimental result shows the flood tweets(Assam) estimates 3% Novelty, 85% Homogeneity, Event Link Ratio of 0.81. The empirical values of sentiment show the tweets are highly negative sentiment during the disaster where people use most of the negating words. From the values, it shows 3% novelty, there exist an overall unique topic similarity across and between the intervals considered. The similarity of vocabulary in topic words is 85%, indicates twitter users tweet regarding flood using identical terms. The value 0.78 is Event Link Ratio of 81% shows more than 80% of the total volume of tweets is related to flood at Assam. The performance of experiments in event burst distribution detection was evaluated using two metrics: temporal burst ratio and false panic rate.
The experiments show, our burst detection mechanism achieves an overall burst detection rate (BDR) of 91% and a false panic rate (FPR) of 4.1%. The tweets are classified using Naïve Bayes classifier as two classes namely Class M (posts without multimedia content) and Class N (posts with multimedia content). The entropy values from Table 2 shows users retweet for more multimedia data rather than raw text tweets without multimedia data. The popular Twitter user page, shows high entropy whereas bot user page namely Bloombrg with very low entropy. The low-entropy hash tags are lightly specific and moreover related to short-term events.

Event Detection Rate
The processing delay time needed to start monitoring and confirming event detection is

Event Location Detection Method
We use the following measures to evaluate the efficiency of location detection models. In the experiment, we used Arbitrary Method (AM), Frequency based Method (FBM), K-Center Clustering (OK-CC). For each user, we select the center of the circle with maximum location references within the circle.
The radius of the circle is the tolerance value and is same as N in Accuracy@N and 25 miles for tolerance value N as zero.  We use ACC@25, ACC@50, and ACC@75 to calculate accuracy within 25, 50 and 75 miles, respectively. N is the tolerance value is computed and noted in Table 4. It is observed with respect the unambiguous location references, the accuracy is obtained reasonably between 50 miles to 75 miles. Comparing with all location references, the accuracy of 83% is obtained maximum at 75 miles. The effect of time on location detection and percentage of location detail is shown in Table 5. It is observed that the location is detected with highest accuracy for KCC algorithm over time increase eventually. However, the accuracy of RDM and TVM methods shows less accuracy in detecting crisis event location. The normal distance blunder increments from 79 miles for N =25 to 120 miles for N =75. Online K-Center clustering algorithm proves to be an efficient no-regret online algorithm which detects the event location cluster with 85% accuracy.

Conclusion and Future Work
This paper focused on the even detection in real time social media and reported the levels of information spread during disasters using Twitter activities with time bounds. We characterize dynamics of tweeting activity associated on social media by calculating the entropy in time-interval distributions, similar user, hash tags and sentiment score. The results show classification of users as real-humans and bots. The proposed entropy-based quantification method identified popular users, hash tags which help us to analyze real human voices in form of tweets during disasters. Indeed, it exhibits the perception of transforming social media into a news media platform. Our system detected three major flood events during 2015-2016 showing disaster event detection rate 94% which is acceptably high. The proposed MapReduce framework detected the event in 75 seconds which is more efficient when compared to stateof-art method which used 200 seconds for confirming an event. From experimental results, the quantification method strategy considering different capabilities with similarly high accuracy contrasting and the customary procedure while keeping up the presentation to recognize the cautions of sudden disaster events. It is observed with respect the unambiguous location references, comparing with all location references, the accuracy of 83% is obtained maximum at 75 miles. The effect of time on location detection is identified with highest accuracy of 75% for KCC algorithm.
In future work, the proposed model is planned to automate for identifying various natural disaster such as earthquake, Tsunami, and also man-made disasters such as the terrorist attack in distributed realtime environment.

DECLARATIONS:
This statement is to certify that all Authors have seen and approved the manuscript being submitted.
I warrant that the article is the Authors' original work. I warrant that the article has not received prior publication and is not under consideration for publication elsewhere. We have no conflict of interest to declare. No Funding for this work is allotted. The data used in the research work is private data and used.

AUTHOR CONTRIBUTIONS
*Funding (information that explains whether and by whom the research was supported) -NOT APPLICABLE *Conflicts of interest/Competing interests (include appropriate disclosures) -NONE *Availability of data and material (data transparency) -Twitter Public data *Code availability (software application or custom code) -Customized code to achieve efficiency in Python *Authors' contributions (optional: please review the submission guidelines from the journal whether statements are mandatory) Conceptualization, Methodology, Validation, Formal analysis, Investigation, Resources, Writing-original draft preparation by Dr. Bhuvaneswari Anbalagan