The researcher follows Empirical research design because of the nature of both people opinions have been subjective towards the topic. the annotation of extracted data need experts and experimental setup needs to validate the model. Therefore, the Sentiment analysis on this research has the potential of employing Hybrid techniques.
Deep learning
As our data is numerical an ANN have created a vector product of each input with a randomly generated number often in the range (-1, 1) along with a bias value which are feed into a summation or activation function to determine an output value. A bias value is used in the event of all data points are equal to zero which means there would be no value to multiply with weights. Therefore, the neuron would not generate an output. The function f represents the activation function used to determine a non-linear output of the neuron and can use mathematical functions such as in following which are also visualized in Fig. 1 below:
Sigmoid: takes a real-valued input and squashes it to range between 0 and 1
σ(x) = 1 / (1 + exp(− x))
tanh: takes a real-valued input and squashes it to the range [-1, 1]
tanh(x) = 2σ(2x) – 1
ReLU: ReLU stands for Rectified Linear Unit. It takes a real-valued input and thresholds it at zero (replaces negative values with zero) f(x) = max (0, x)
An ANN can comprise of many different neurons each with different activation functions depending on the context of the problem. In our case we expect our final layer the output layer to use a Sigmoid activation function as our class labels would be in the range (0, 1). Functions such as tanh and reLU would be useful in the input and hidden layers of a deep network in order to improve the network’s performance as it grows more complex and depending on the size of our input data.
A multi-layer network contains many neurons, and depending on the network structure, alters the weights through forward and back-prorogation, using the labeled training data to determine the best model. Multi-layer networks are useful for non-linear boundaries and the outputs of neurons in earlier or hidden layers, feed into later layers to determine a classification [22]. The neural network training process and error handling through methods such as back-propagation are computationally more expensive than the NB or SVM but are also more powerful and can achieve better results in more than one context particularly for multi-classification problems [21].
A simple multi-layer preceptor or deep network can be visualized in Fig. 2 below:
Input features, labeled X1 and X2, represent data in numerical terms and as we mentioned earlier they are associated with a weight value and passed into an activation function. A single or multiple hidden layer can contain any number of neurons or nodes which take the previous layer’s activation function result and pass it forward for further calculation which is subsequently used in the output layer to determine a final value. The number of features in our system would be the size of the dictionary file or vocabulary Xn, where n is the number of words in the file and each value of Xn-1 of our inputs would represent a one-hot encoded value of our raw text data. Finally, the output layer would Consist of the number of class labels in our annotated data ranging in (0,1).
An ANN is trained through the processes of forward and back-propagation. Forward propagation is the process of applying the dot product of our input features with randomly assigned weights and passed into an activation function, where the output can be represented as the following:
Y = f (x1*w1 + x2*w2 + … + xn-1 *wn-1)
Where denotes the number of features of our data f as the activation function and Y as the output result. These output values are passed into deeper layers and eventually to the output layer which compares the predicted result against the training data to determine whether the model has correctly predicted the outcome or not. Often forward propagation returns incorrect results and therefore requires further processing methods to help the network learn perform better, which can be achieved by updating the weights along with back-propagation.
We calculate the total error at the output nodes and propagate these errors back through the network using Back propagation to calculate the gradients. Then we use an optimization method such as Gradient Descent to adjust all weights in the network with an aim of reducing the error at the output layer [21]. The name for one commonly used optimization function that adjusts weights according to the error they caused is called “gradient descent. Gradient is another word for slope in its typical form on an x-y graph represents how two variables relate to each other. In this particular case the slope we care about describes the relationship between the network’s error and a single weight like [23].
The network repeats these processes via iterations or epochs until the model has correctly learnt to classify the data. All of these factors affect the networks performance and as we discuss the design, implementation and evaluation of our model. we learn through time how each of these functions and algorithms can greatly improve or reduce the accuracy of the network and so inform the improved designs of our ANN.
There are many works using deep learning for natural language processing and most use variations of the simple neural network in order to achieve their respective goal within their relevant context. However, they share a common foundation in taking raw textual data and representing the individual words or word pairs as vectors, to be used for further processing. Word embedding is a distributed representation of a word often a one-hot representation which is suitable for the input of neural networks [25] where a word corresponds to a one-hot vector indicating a binary value denoting its presence in a document.
A convolution neural network is similar to a normal neural network in its operation but its architecture is better suited for image processing problems as the input vectors are in 3d form and can perform better than regular neural networks on a larger scale. The efficiency of a convolution neural network or CNN has inspired use cases for natural language tasks by transforming one dimensional textual data into a matrix as the input image for the network. A big argument for CNNs is that they are fast. Convolutions are a central part of computer graphics and implemented on a hardware level on GPUs. The developed Deep learning approach for Amharic sentiment analysis Architecture is listed below.
Data collection
We focus on this resarch primary data source FBC offical Facebook page because this page was legal under the Facebook company terms and condition and people express his/her idea freely on social media. Despite the researcher concentrate on the following aspect only from the broad concept of sociopoltical domain area thus are immegration data, public relation and war .
We collect the data by using Facebook API by creating developer account and scrape using the Facepager software text processing tool. The main attribute extracted during the data collection are listed on appendix A.
We use 1800 reviews collected from thus issue among that we use 1602 reviews including emojis and common 4000 vocabulary file for training and testing the model.
Table 5.1
reviews collected from Fana broadcasting corporation
Issue | | Number of reviews |
Immigration | | 5652 |
Public relation | | 3482 |
War | | 1866 |