A door-to-door survey was conducted to collect data by covering 3020 households as data points that are scattered over 20 districts out of the total of 25 districts, considering the rate of infection data provided by the Department of Census and Statistics, Sri Lanka. Households were selected utilizing the multistage clustering technique which was the hierarchical structure of the power distribution in administrative divisions in Sri Lanka. The survey was around 40 minutes of face-to-face interviews with a questionnaire covering the socioeconomic and behavioral impact on a household influenced by the pandemic. Initially, the questions related to the basic demographic information of the family were answered by the main respondent of the survey. The questions related to a particular impact area were answered by the most relevant member of each household. The responses to the questions were recorded using two types of Likert scales designed to assess the perceived impact. These Likert scales vary from 1 to 5, where a rating of 1 represents "strongly agree" and a rating of 5 represents "strongly disagree" for positively worded questions. For negatively worded questions, a flipped scale was used (see the Data Availability section).
In this study, there are three predominant outcomes. First, identifying the factors that influenced the overall education experience of the students during the pandemic. Second, clustering the surveyed population in an unsupervised manner and identifying the resulting clusters of groups that were similarly affected by the underlying factors affecting education. Furthermore, upon analyzing the above clusters/groups using demographics of the population such as income & education, it was concluded that the unsupervised clustering has resulted in a division through income-level, and it is explained further in the subsequent results section. Finally, by combining the above two outcomes, this work studies the variations of the identified factors affecting education, over the income levels of the identified classes.
Initially, responses to education were selectively extracted from the original dataset, yielding 1643 out of the 3020 households (54.4%). The study employed the Kaiser–Meyer–Olkin Measure (KMO), Cronbach alpha coefficient, and McDonald’s omega coefficient as mentioned 63,64 in the Method section, to assess the validity and reliability of the dataset (Table 1).
Table 1
Validity and reliability test results obtained from the Kaiser–Meyer–Olkin Measure (KMO), Cronbach alpha coefficient, and McDonald’s omega coefficient.
|
KMO
|
Cronbach alpha
|
McDonald’s omega
|
Number of Variables
|
Education
|
0.767
|
0.917
|
0.928
|
22
|
Principal Component Analysis (PCA) with varimax rotation 64 was employed using the SPSS version 27 to extract the underlying factors of the dataset. Further, the questions associated with each factor were chosen according to the rationale presented in 45,65,66, whereby only questions with a factor loading above 0.4 were deemed to be representative of the underlying factor. The 0.4-factor loading threshold was selected consistent with literature 63,64,67. Moreover, the study utilized the Kaiser criterion as the theoretical framework for factor extraction in the education dataset as mentioned in the Methods section Specifically, factors with eigenvalues greater than 1 were selected, resulting in the identification of seven distinct factors within the education dataset. The obtained factors are shown in Table 2. The cumulative explainable variances of the seven factors are 60.7% of the total observed variance for the education dataset.
Once the questionnaire has been factored in, the study of how these changes have influenced the different socioeconomic groups is vital. The spectral clustering described in the method section was used to identify the dominant mode which corresponds to the number of social groups. These identified social groups in the dataset formed clusters from the spectral clustering algorithm described in the method section. As detailed in spectral clustering, the dominant mode was found to be three, as the maximum eigengap was between three to four, which means that the dataset primarily can be grouped into three distinguishable classes 68.
After finding the number of underlying clusters, individuals belonging to these clusters were separated from the pool of people by employing the K-means algorithm 69. Further, it could be found that these clusters were distinguished in terms of their average income variation throughout the three waves of the pandemic upon examining the demographics. As a result of that, these individuals were classified as "High", "Mid", and "Low" based on variations in their average income level (See Fig. 2).
The main intuition of labeling the people based on their pre-pandemic income level can be considered more reasonable and unbiased as the people's income may have deviated from a small amount to a large amount during the pandemic period owing to the economic downfall experienced by Sri Lanka and the heightened job insecurity resulting from multiple lockdowns. Based on that criterion, people who had a high average pre-pandemic income were considered a high-income class. Following that, others are labeled as mentioned above.
Subsequently, a comprehensive analysis was conducted to explore the effects of the identified factors on individuals' lives and lifestyles, aiming to gain a deeper understanding of their impact and how they manifested across diverse groups. As mentioned previously, the identified factors from the initial part of the study (factor analysis) were clustered using unsupervised clustering, and that resulted in three groups. It was observed that the resulting three groups were equivalent to the high, middle, and low-income categorized subjects. Thereafter, outcomes of factor identification and unsupervised clustering were combined to analyze the variational impact of factors through the demographic groups for further impact assessment. This involved delving into the intricacies of their demographics, and meticulously examining the reasons behind these consequences. Subsequently, focusing on the above social groups, the authors examined the average impact of the identified factors on the identified different categories of income, revealing valuable insights. Figure 3 represents how the seven extracted factors influenced the identified data-driver clusters primarily distinguishable via income levels. In the second phase of this study, the authors made significant efforts to consider the demographics of the primary respondents, who represent households and play a crucial role in shaping their children's education. The intention was to capture a comprehensive understanding of the household's background and how it affected the factors impacting education. Therefore, among the diverse demographic details available in the dataset, particular emphasis was placed on investigating the impact of marital status, education level, and employment sector. The results of Fig. 3 are further explained later in this section and in the discussion.
The marital status of the parents is graphically illustrated in Fig. 4(a), highlighting five marital statuses: Married, Divorced, Widowed, Separated, and Unmarried. A concise summary of the different marital statuses that are considered is shown in Fig. 4(a) and it can be observed that the number of divorced households is comparatively large in high-income families. When considering Widowed families, low-earning families have outnumbered all the other socio-economic groups. Additionally, the mid-income groups have shown a higher percentage of possession in the remaining marital states than the other two groups.
Next, Fig. 4(b) yields valuable insights into the education level of the parents of the surveyed households. According to this plot, it is distinguished that most low-income respondents have not completed their secondary or higher education. Compared to the other two income groups, the main respondent of poor households has never had the opportunity to attend any form of formal education and it is 6% as a percentage. The uneducated percentage in middle-income households was less than one-fifth that of low-income families surveyed. Moreover, it is consistently observed that a greater number of impoverished families attain education up to Ordinary Levels (O/L) compared to the other two socioeconomic categories (see Fig. 4(b)). Additionally, there is a clear trend indicating higher rates of advanced-level education (A/L) among middle-income families. In contrast, individuals residing in high- and middle-income households exhibit the highest levels of educational achievements, including diplomas, undergraduate degrees, master's degrees, and even Ph.Ds.
From Fig. 3, it can be observed that factor impact variations can be primarily classified into distinct categories based on their relationship with financial strength. Factors with strong linear relationships that are directly associated with financial strength such as Network access, Resource affordability, and Household environment, those with slightly linear relationships such as Device accessibility and know-how, those independent from income such as person inherit qualities like self-motivation, and the factors varying non-linearly with income like socio-cultural and emotional, and teacher availability and preparedness.