We conducted our study by gathering data from twitter using Twitter API. It is not a new idea but we have to separate English tweets from other languages tweets, it was not an easy task. Manual work was also done for this purpose. The methodologies used for analysis were unique with research. We collected our data for research by keeping in mind the purpose of our study.
Data collection is not an easy process as it might appear at initial stage. We used the keyword #hashtags that were further passed on Python using twitter API and credentials. The data extracted from twitter was of many languages but we focused on English tweets, gathered and transformed into dataset. To filter non-English tweets, we used SQL query to filter out all those tweets. After filtering tweets, we created a dataset to classify those gathered tweets into positive and negative. On the other hand, we ignored neutral tweets because they had no impact on analysis and results. Tweets were extracted by using the Python Program. It streams the data with the given hashtag. After applying the program, the twitter started the collection of tweets against the passed # tags. SQLite is an associated title in the collection of procedures that uses an autonomous transaction SQL information machine, without a server, zero configurations. SQLite is the most implemented information in the world with additional applications that we are about to tell, along with many other high profile [8]. It is the most implemented information, embedded level SQL without server method.
There is a trade-off between memory usage and speed, SQLite typically runs earlier the additional memory you provide it. However, its performance is typically quite smart, even in low memory environments. Betting on however, SQLite is quicker than direct file system. After passing the hashtag, data was streaming. When the tweets stop streaming the data was saved into database SQLite.
In the next phase, we started converting our collected data in tabular form by using SQL Lite 3 database. The tweets were in raw form on which we had to elaborate the other non-required fields such as follower count, friends count, status count and others.
Up to this point, we were not sure about negative and positive tweets, the datasets that would be utilized for preparing and analyzing the tweets were of English and non-English.
As earlier discussed, we are focusing on English tweets only but the twitter users from all over the world use different languages to express their sentiments and opinions. For this purpose, we need to filter other language tweets with SQL Query.
We collected the tweets from January – July 2018 using Twitter API about five top trending political and popular topics of Pakistan. The tweets were collected by passing the keyword / hashtag to the transmission API. A hashtag is a word that uniquely identifies a topic on Twitter. Following are the top trending topics on which tweeter user express their sentiments at the time of the study
-
#SheikhRasheed
-
#HanifAbbassi ephedrine
-
#Election2018
-
#Massive Rigging
-
#SherAaRaha Ha
We gathered tweets against each topic as per heading which depicts in below Fig. 3.1.