4.1 Procedure
The five main stages followed by this study are Collect, Assess and Clean, Analyze, Model, and Visualize. The collect stage starts by finding the appropriate instrument to collect the required data. The selected instrument was a questionnaire adopted from the literature. Minor modifications were made to the adopted measurements to match the study purpose, along with English-to-Arabic translation. An evaluation method called back-translation was used to ensure translation quality [5, 7]. Before the main questionnaire distribution, a pilot questionnaire was carried out and final modifications were made to address challenges and ensure clarity and reasonable response time. It was conducted using a convenience sampling method. The data collected from the pilot study was not included in the main study to avoid bias issues associated with convenience sampling [28]. Thereafter, the main questionnaire was distributed.
The Assess and Clean stage started after receiving a number of responses and closing the questionnaire. The main purpose of this stage is to facilitate and improve the Analyze stage to get valid and reliable results. Therefore, data assessment was carried out to address issues related to data content and structure, i.e., completeness, validity, accuracy, and consistency. In the Clean stage, these issues were resolved, such as excluding identical responses, filling in missing values, and transforming data to numerical scales.
The Analyze stage included descriptive statistics and inferential statistics. Descriptive statistics include finding Mean, Median, Standard Deviation (SD), maximum (Max), and Minimum (Min), while inferential statics include discovering p-values and coefficients. These statistics helped to find the relationships between variables and validate the hypothesis.
All previous stages built a solid basis for the Model stage. An intelligent ML model to predict users who were likely to suffer from privacy fatigue was built in this stage. It involved splitting the data using 5-fold cross-validation, training the model using several classification algorithms based on the literature, i.e., SVM, NB, KNN, DT, and RF, evaluating the classifier accuracy, recall, precision, and F1, and finally selecting the most accurate algorithm based on the evaluation results.
4.2 Measurement development
Measurements of all the study variables were derived from related works [5, 29–31]. Some of the questions were slightly modified to match the study’s purpose and scope (Table 2). All questions were measured using a five-point Likert scale ranging from “strongly disagree” to “strongly agree”.
Fatigue, in general, is characterized by emotional exhaustion and cynicism behavior, as discussed in the literature [5]. Therefore, the adopted scale measuring individuals’ privacy fatigue is based on these two core dimensions, each of which has 3 items to measure it [5]. This scale was selected because it is the first scale developed to measure privacy fatigue; moreover, it was adopted by all the subsequent studies and reported its validity and reliability [17, 22].
To measure IPA, two models were combined. Several studies measured IPA by measuring privacy concerns, assuming that individuals’ concerns about future risks could be associated with their awareness of potential risks [32, 33]. However, to measure IPA directly, a model called Information Privacy Awareness (IPA) indicated that there are three aspects that should be considered by researchers that make up privacy awareness [30]. These aspects cover the awareness of:
1) The element related to information privacy.
2) The element’s existence in the current environment.
3) The element’s impact, where the element could be technology, regulation, or practice.
In this study, the element is practice. The users’ awareness level of the practice of collecting, using, and sharing personal data for social media ads was measured.
Considering these three aspects, a scale called the Information Privacy Concerns (IPC) model, was adopted [29]. The model is comprehensive and has been employed by several studies to measure internet privacy concerns where awareness research chose the appropriate dimensions and items of the model based on the purpose of the research [22, 26]. The model includes several dimensions of internet privacy concerns, such as personal information collection, unauthorized secondary usage, and improper access. Each dimension includes a set of valid and reliable items. Therefore, appropriate items were chosen from the IPC model [29], with respect to the IPA aspects [30]. Items were modified to address awareness instead of concerns, so as to measure awareness directly, as suggested by Correia and Compeau (2017).
To measure participants’ personality traits, the Big Five Inventory (BFI-10) scale was employed [31]. Considering participants’ limited time, there are two scales that offer only 10 items of the BFI: the Ten-Item Personality Inventory (TIPI) and BFI-10 [31, 34]. Both measure extraversion, neuroticism, openness, conscientiousness, and agreeableness traits using two items for each. Both proved their simplicity, reliability, and validity. However, BFI-10 is the most recent and was clearer when translated into Arabic.
Table 2
Measurement of the variables
Variables | Items | Sources |
Privacy fatigue | 1. I feel irritable when dealing with privacy issues associated with social media ads (e.g., using my personal information on these ads) 2. I think I am tired of privacy issues regarding social media ads 3. It is tiresome for me to care about privacy on social media 4. I have become less interested in privacy issues regarding social media ads 5. I have become less enthusiastic about protecting personal information provided to social media 6. I doubt the significance of privacy issues on social media more often | [5] |
IPA | 1. I am aware that social media platforms are collecting too much personal information about me 2. I am aware that when I give personal information to social media platforms for some reason, it would be used for other reasons such as ads 3. I am aware that social media platforms would sell and share my personal information with advertising companies 4. I am aware of how my personal information on social media platforms will be collected, processed, and used for advertising purposes 5. In general, I’m aware that it would be risky to use my personal information for social media ads 6. I am aware that there would be a high potential for loss associated with the collected, shared, and used personal information by social media for advertising purposes 7. I am aware that collecting, sharing, and using my personal information for social media ads would lead to many unexpected problems 8. I am aware that there would be too much uncertainty associated with the collected, shared, and used personal information by social media for advertising purposes | [29, 30] |
Personality trait | 1. I see myself as someone who is outgoing, sociable 2. I see myself as someone who is generally trusted 3. I see myself as someone who is relaxed, handles stress well 4. I see myself as someone who is dependable, does a thorough job 5. I see myself as someone who has few artistic\creative interests 6. I see myself as someone who has an active imagination 7. I see myself as someone who is reserved, withdrawn 8. I see myself as someone who tends to be lazy 9. I see myself as someone who gets nervous easily 10. I see myself as someone who is critical, tends to find fault with others | [31] |
4.3 Sample
The demographics of the respondents are summarized in Table 3. The sampling technique used for the main study is probabilistic sampling, particularly simple random sampling [28]. This technique was used to ensure an unbiased random selection and a representative sample where each member of the population has an equal chance of being selected, which results in more accurate and generalizable results [35].
All methods were carried out in accordance with relevant guidelines and regulations as this research includes human participants. The experimental protocols and procedures were approved by the research committee of the Information Systems department at King Abdulaziz University, as it follows common and predefined regulations, and does not expose any personal information nor include any dangerous or harmful activities. However, the following guidelines were applied based on the committee recommendations to maintain confidentiality and anonymity, increase the response rate, ensure the quality of the data, and gain the respondents’ trust:
-
The purpose of the study was declared at the beginning of the questionnaire.
-
It was clearly stated that the data will be collected and used for study purposes only.
-
It was made clear that respondents had the right to skip any question or to stop at any stage if they did not want to continue.
-
Participation consent was required: therefore, respondents had the opportunity to agree or disagree to participate.
-
Questions that identified respondents were not included.
The main questionnaire was distributed from October 10, 2022, to October 30, 2022, through social media platforms such as Instagram, Snapchat, WhatsApp, and Telegram. After completing the exclusions, the number of responses obtained was 508 out of 538.
Female respondents formed the majority of the sample, with 399 (79%). 260 participants fell into the 20–34 age group (51%), and 333 participants had a Bachelor’s degree (66%). The occupations showed close percentages, with 116 students (23%), 116 private sector employees (23%), 112 unemployed participants (22%), and 108 employees at public/government organizations (21%). Regarding the sample’s social media usage, 502 participants (99%) used social media daily. 225 respondents (45%) reported spending 3 hours or more on average on social media daily.
Table 3
Demographics of the sample
Measure | Items | Frequency | Percentage |
Gender | Female | 399 | 79% |
Male | 109 | 21% |
Age | 19 and below | 42 | 8% |
20–34 | 260 | 51% |
35–44 | 111 | 22% |
45 and above | 95 | 19% |
Educational Qualification | High school or below | 89 | 17% |
Diploma | 40 | 8% |
Bachelor’s degree | 333 | 66% |
Master’s degree or higher | 46 | 9% |
Occupation | Student | 116 | 23% |
Employed at public/government organization | 108 | 21% |
Employed at private sector | 116 | 23% |
Self-employed | 17 | 3% |
Unemployed | 112 | 22% |
Retired | 39 | 8% |
Social media usage frequency | Daily | 502 | 99% |
Weekly | 4 | 1% |
Monthly | 0 | 0% |
Less often | 2 | 0% |
If daily, the average time spent on social media | 1 hour and a half or less | 62 | 12% |
2 hours to 2 hours and a half | 113 | 23% |
3 hours or more | 225 | 45% |
I don't know | 101 | 20% |
4.4 data analysis justification
The variables of this study, i.e., privacy fatigue, IPA, and personality traits are latent variables [37, 38]. This implies that each variable is measured using multiple questions. Various methods can be used to analyze latent variables, such as structural equation modeling (SEM) or taking the sum/mean of all the questions’ scores for a particular variable [37, 38]. The advantage of SEM is that it considers the reliability and validity of the questions during the analysis. However, with a small sample size, SEM will do poorly in discovering the actual effect [38]. Therefore, considering the sample size of this study and assuming that all questions are reliable and valid based on the literature, the mean of the answers of each variable was taken. There are 2 questions for each personality trait, 8 for IPA and 6 for privacy fatigue. The answers to these questions ranged from 1 to 5, as the options were a 5-point Likert scale. The average of each variable’s answers was calculated and used for analysis.
However, a classification for privacy fatigue was required in this study (0 or 1). Therefore, after taking the average, people who scored 3 or more were classified as having a medium to a high level of privacy fatigue and labeled 1. Those scoring lower than 3 were classified as having a low level of privacy fatigue and labeled 0.