6.1. Datasets
One of the most frequently used datasets in previous works is the Liar dataset 20. Liar is a multi-class dataset with six different labels collected from traditional news outlets like TV or radio and campaigns. There are 12836 short statements with 'True,' 'Mostly-True,' 'Half-True,' 'Barely-True,' 'False,' and 'Pants of Fire' labels. The other valuable columns include statement, subjects, speaker, and speaker's job.
There are several fake news datasets like Buzzfeed corpus 21, Satire dataset 22, CREDBANK 23, and FEVER 24 that each of which has its characteristics, and researchers select all or part of them to use in their research. Still, one of the most striking features is that the news label is in binary format like True or False. For this reason, lots of papers built a binary fake news detection model. Multi-class datasets have been converted to binary datasets by thresholds and then used.
A new dataset was recently introduced that collected 24517 news from three fact-checking websites (Politifact.com, Snopes.com, and truthorfiction.com) from September 1995 to January 2021. Most of the collected news is related to Politifact with about 60 percent of the dataset, then Snopes with 35 percent standing in second place, and Truthofiction with just about 5 percent is in last place. These websites have different methods to set the label for the news. There are five standard labels in this dataset (True, Mostly-True, Half-True, Mostly-False, and False) for each news. This dataset distinguishes itself from others with two features: multi-class and up-to-date news 25.
We selected this dataset because of collecting new data from reliable websites. The fake news writing style is progressing, and this development impacts the detection, so the more recent data can help make an efficient detection model. The dataset consists of 23935 (each row with a null value is removed), as shown in Fig. 4, and has five labels (True, Mostly-true, Half-true, Mostly-False, and False).
Our paper performs the experiments on Intel I7-4710HQ CPU, NVIDIA GeForce GTX 850M GPU, and 12 GB memory. One of the famous python libraries for classification, regression, and clustering is Scikit-learn which we use to create first-level classifiers. First-level classifiers that we use to create an ensemble network are Random Forest, SVM, Decision Tree, LGBM, and XGBoost. The Scikit-learn library also creates the stacking ensemble network. The stacking network comprises the output of individual classifiers and uses a meta classifier to compute the final prediction.
6.2. Evaluation
Different metrics were provided for evaluation, such as Accuracy, Precision, Recall, and F1-Score, so we compare models with them.
Accuracy: Defined as the number of correctly predicted data instances over the total number of cases.
$$Accuracy=\frac{\left|TP\right|+\left|TN\right|}{\left|TP\right|+\left|TN\right|+\left|FP\right|+\left|FN\right|}$$
6
Precision: Defined as the proportion of correctly predicted positive instances to the total positive ones.
$$Precision=\frac{\left|TP\right|}{\left|TP\right|+\left|Fp\right|}$$
7
Recall: Defined as the proportion of correctly predicted positive instances to all instances in the actual class.
$$Recall=\frac{\left|TP\right|}{\left|TP\right|+\left|FN\right|}$$
8
F1 Score: The weighted average of Precision and Recall.
$$F1-\text{S}\text{c}\text{o}\text{r}\text{e}=2\text{*}\frac{precision\text{*}recall}{precision+recall}$$
9
Besides the most popular evaluation metrics mentioned above, Cost Per Example (CPE) is one of the metrics used to show how much multi-class models cost for a wrong prediction. A cost matrix is constructed to evaluate a multi-class fake news classification model, as shown in Table 2. In the multi-class classification, this matrix is better displayed. Eq. (5) shows how to calculate the CPE 26.
$$CPE=\frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{m}CM\left(i,j\right)*C(i,j)$$
10
Where C(i,j) and CM(i,j) are confusion and cost matrices, N denotes the total number of test samples, and m represents the number of classes in the classification operation. A confusion matrix is a square matrix in which each column is assigned to an actual class, and each row is assigned to a predicted class. Each element in row i and column j, CM(i, j), indicated the number of not correctly classified samples that belonged to class i and were ranked in class j. The elements on the primary diameter of the matrix indicate the number of samples that are correctly classified. The cost matrix is structurally similar to the confusion matrix. Its values are set between zero and one, except that the C(i, j) element in this matrix is the penalty cost for incorrectly classifying an instance. Therefore, the principal diameters of matrix C always have a zero value because the principal diameter indicates the correct classification of the samples. The cost matrix is designed innovatively between 0 and 1, costing zero for the best case and costing one for the worst-case scenario. This matrix may not be the best, but it is a benchmark for evaluating our model. When the calculated penalty is closer to zero, the model receives a minor penalty and performs better 26.
Table 2
| True | Mostly true | Half true | Mostly False | False |
True | 0 | \(\frac{1}{4}\) | \(\frac{1}{2}\) | \(\frac{3}{4}\) | 1 |
Mostly true | \(\frac{1}{4}\) | 0 | \(\frac{1}{4}\) | \(\frac{1}{2}\) | \(\frac{3}{4}\) |
Half true | \(\frac{1}{2}\) | \(\frac{1}{4}\) | 0 | \(\frac{1}{4}\) | \(\frac{1}{2}\) |
Mostly false | \(\frac{3}{4}\) | \(\frac{1}{2}\) | \(\frac{1}{4}\) | 0 | \(\frac{1}{4}\) |
False | 1 | \(\frac{3}{4}\) | \(\frac{1}{2}\) | \(\frac{1}{4}\) | 0 |
As shown in Table 3, the accuracy of five basic classifiers is reported separately. While we describe our model with the Accuracy, Precision, Recall, and F1-score, some works only reported part of these evaluation metrics, so a comparison has been made with the reported metrics.
Table 3
Model | Accuracy |
SVM | 72% |
Decision Tree | 78% |
Random Forest | 79% |
XGBoost | 82% |
LGBM | 84% |
We train our model with a multi-class dataset in two different ways. For evaluation with binary models built in Huang et al. 7, Shu et al. [23], Zhou et al. [4], and Palani et al. 27 need to change the multi-class dataset to a binary dataset. Firstly, we convert labels of the dataset to binary form, which changes the mostly-true label to true and mostly-false to false and leave out the half-true labels then train the model with them. evaluation report is shown in Table 4. Secondly, we train the model with multi-class data and compare it with Rashkin et al. 6, who provided only F1-Score for comparison. Still, all evaluation metrics of our model are reported in Table 5. Each previous work performed its model with different datasets, but Politifact data is common within all of them. Therefore, in one part of our experiments, we train our model on Politifact data and then evaluate it. Train and test sizes are 80% and 20%, respectively.
Table 4
Comparison of the result of our model on Politifact data (binary classification)
| Politifact |
| Accuracy | Precision | Recall | F1-score |
Huang et al. 7 | 76 | 75 | 75 | 75 |
Shu et al. 28 | 87 | 86 | 89 | 88 |
Zhou et al. 4 | 89 | 87 | 90 | 89 |
Palani et al. 27 | 93 | 92 | 91 | 92 |
Stacking Ensemble Network | 96.24 | 96.67 | 96.74 | 96.71 |
Table 5
Comparison of the result of our model on Politifact data (multi-class classification)
Methods | Politifact |
| Accuracy | Precision | Recall | F1-score |
Rashkin et al. 6 | --- | --- | --- | 22 |
Stacking Ensemble Network | 94.40 | 94.31 | 94.02 | 94.15 |
Our primary intent is to build a multi-class fake news detection model with diverse Politifact, Snopes, and Truthorfiction. These are the three most famous fact-checking websites 25. After all experimental evaluations are reported, we describe the classification report on complete data in a multi-class form, as shown in Table 6. Also, the confusion matrix of this trained model brings in Table 7.
Table 6
Comparison of the result of our model
| All Data (Politifact + Snopes + Truthorfiction) |
Accuracy | Precision | Recall | F1-score |
The Proposed Model Multi-Class Form | 83.60 | 83.97 | 81.94 | 82.81 |
The Proposed Model Binary Form | 91.52 | 93.21 | 93.65 | 93.43 |
Table 7
Confusion matrix result of our model on multi-class data
| | True | Mostly True | Half True | Mostly False | False | CPE |
Actual label | True | 1641 | 112 | 38 | 53 | 4 | 0.0487 |
Mostly True | 112 | 600 | 41 | 23 | 3 |
Half True | 91 | 68 | 660 | 18 | 3 |
Mostly False | 71 | 51 | 28 | 522 | 1 |
False | 15 | 34 | 24 | 6 | 568 |
| Prediction label | |