Data
This study analysed the content of six English textbooks and generated a corpus from 150 exam papers for the National High School Graduation Examination. For qualitative discourse analysis, I selected the e-copy version of 6 English textbooks from Grade 10 to Grade 12, published by the Vietnam Education Publishing House (2018 edition) for the books are nationally used in the public school system in Vietnam, covering over 15 million Vietnamese upper secondary students (London, 2011). For a topic modelling analysis, I selected and downloaded a total of 150 past and practice exam papers for the English National High School Examination from 2018 to 2020. Table 2 describes a yearly breakdown of selected English exam papers and their download sources.
Table 2: Distribution of English exam papers in 2018, 2019, 2020
Year
|
Official exam papers
|
Mock exam papers
|
2018
|
5
|
45
|
2019
|
5
|
45
|
2020
|
5
|
45
|
Total of official and mock exam papers
|
150
|
Source: www.moet.gov.vn; www.dethi.violet.vn; www.deluyenthi.vn; www.downloaddethi.vn
Methodological Procedures: Thematic Discourse And Topic Modelling Analyses
This study employed a corpus-based approach to investigate the alignment link between textbooks and test papers of English as a Foreign Language subject in Vietnam. First, I conducted a thematic discourse analysis of six selected English textbooks. The word ‘discourse’ in this study is understood as the sequence of sentences that compose meaning of the whole text, rather than as the pragmatics of language in context (Gee, 2014; Taylor, 2014). Hence, a thematic discourse analysis was used to capture key themes of knowledge and lexical items presented to learners. Then I performed a topic modelling analysis on a corpus of 150 English exam papers to discover the thematic content of all the exam items, including the exam question, the multiple-choice options and the reading texts, through a statistical model so-called Latent Dirichlet Allocation (LDA). LDA is an automated text-analysing technique that extracts and categorises large collections of texts into a pre-defined number of topics or themes (Brookes & McEnery, 2018; Jacobi et al., 2016; Rohrer et al., 2017).
There are three key assumptions for using LDA as the second step in this study. First, it is necessary to presume that each document in itself is a mixture of different smaller topics, and it is possible to attribute the probability of the word’s occurrence to one of the topics. Second, it is possible to automate a process to analyse the content of a large collection of texts so that different words in each document can be grouped into different topics based on their probability of occurrence. This assumption is important because it allows the use of computational algorithms to automatically assign each word into a topical theme based on its probability of occurrence. This automation offers high efficiency and, more importantly, ensures objectivity for the research process (Brookes & McEnery, 2018; King et al., 2017; Rohrer et al., 2017). Finally and relatedly, the high frequencies of words and lexical items in the corpus might indicate the content emphasis of the examined documents (Dang et al., 2020; Hsu, 2014; Nation, 2016). This assumption ensures that one could use the topics automatically identified by the LDA process to compare with the qualitatively identified topics from the national textbooks.
The chosen mixed-method design is inevitable for this study for two reasons. First, the qualitative analysis establishes the researcher’s knowledge of the pedagogical content and determine the number of topics to be aligned within the analysis of test papers. Indeed, the LDA process is not possible without the researcher’s qualitative interpretations of the results because LDA does not explicitly name the topics it automatically identifies, nor it can set the quantity of topics to be generated. Second, the quantitative process offers an economical means to analysing textual data at a large scale and presents a useful means of presenting the findings in forms of visual diagrams. Amid the reproducibility benefits of the corpus-based approach, an interpretive step remains vital to facilitate the workings of LDA initially and to give meaning to the results afterwards.
Methodological Considerations With Lda
There remain concerns that LDA may consider a large number of words that frequently occur in the English exam questions and yet convey little meaning and relevance to the thematic content of the English exams (Brookes & McEnery, 2018). In this study, to ensure consistency and comparability for analysis, a pre-processing of textual data was conducted before an actual LDA analysis of the test papers was performed (for a discussion, see Feldman & Sanger, 2007). Once all exam papers were imported into RStudio (the working platform of LDA), all non-English characters in the exam paper documents were removed using the ASCII code. All characters in the compiled documents were then converted to lower case for analytical convenience. I also created a list of stop words, which are words that frequently come up in texts but convey little substantive meaning to the analysis such as command words, articles, and possessive nouns (Rajaraman & Ullman, 2011). A full list of stop words used in this process can be found in Appendix A. The final process results in a corpus of words for the LDA analysis. In performing an LDA analysis, every document is modelled as a combination of topics, and topics are models as distributions of words. This automated process still requires a human factor in that the researcher first had to determine how many ‘themes’ were to be represented in the model. A prior discourse analysis of the English textbooks will identify the themes and then inform the LDA analysis of the n constant number of themes to be generated. After that, the LDA model will run the package ‘topic_modelling’ to generate an automated categorisation of n themes for every cleaned, textual word in which the frequency of the word represents the probabilities of a word to the distributed theme. Finally, gglot2, a visualisation package afforded by the LDA model was employed to generate visual presentations of data findings.
Findings
English textbooks: learning through themes
In examining six chosen English textbooks, this study found the knowledge content was structured into four themes namely ‘our lives’, ‘our society’, ‘our environment’, and ‘our future’. Within each theme, lexical items were further divided into 33 related topics. MOET (2012) officially justified two reasons for choosing the four listed themes. First, as the target learner groups are aged from 16 to 18, the thematic system aims to reflect a transition stage from teenagers to young adults by highlighting the importance of learners’ individual lives and their relationships with society, environment and future. Second, the thematic system takes into account a multicultural context in which Vietnamese can learn to use English as a global language. Amid a potential semantic overlap across the themes, the researcher decided to maintain the original structure of themes as indexed throughout the textbooks so that no textual data is lost during the alignment check process. Table 3 provides details of themes and topics covered in the examined textbooks.
Table 3
A summary of themes and topics covered in English textbooks
Theme
|
Our lives
|
Grade
|
|
Theme
|
Our Society
|
Grade
|
Topic
|
Family life
|
10
|
Topic
|
Serving community
|
10
|
Your body and you
|
Inventions
|
Entertainment
|
Gender Equality
|
Generation gap
|
11
|
Caring for those in need
|
11
|
Relationships
|
Becoming independent
|
Vietnam and ASEAN
|
Leaving school and career
|
12
|
Urbanisation
|
12
|
Life stories
|
The mass media
|
Cultural identity
|
Vietnam and IOs
|
Theme
|
Our environment
|
Grade
|
|
Theme
|
Our Future
|
Grade
|
Topic
|
Preserving the natural
environment
Cultural diversity
|
10
|
Topic
|
New ways to learn
|
10
|
Eco-tourism
|
People and the
environment in conflict
|
11
|
Further education
|
11
|
Global warming
|
Health care and longevity
|
Preserving our heritage
|
The future of cities
|
Endangered species
|
12
|
Lifelong learning
|
12
|
Ecosystems
|
Artificial Intelligence
|
The Green movement
|
The world of work
|
The ‘our lives’ theme includes topics that reflect the emotional, physical, and social aspects of learners’ lives. From grade 10 to grade 12, topics are arranged in an ascending order of emotional maturity and critical reflections that corresponds to the maturing nature and learning needs of learners. An indicative list of lexical items based on semantic relevance to each topic under the ‘our lives’ theme is summarised in table 4 below.
Table 4
Indicative lexical items and topics under ‘our lives’ theme
Grade
|
Topics
|
Lexical items
|
Example of suggested vocabulary
|
10
|
Family life
|
Related vocabulary:
- Household chores, healthy lifestyles, and entertainment
|
Benefit, breadwinner, chore, contribute, enormous, equally shared, household finances, ailment, air, acupuncture, cancer, consume, audience, composer, dangdut, critical …
|
Your body and you
|
Music
|
11
|
The generation gap
|
Related vocabulary:
- Family rules
- Relationships between boys and girls
- Personal views and opinions
|
Afford, attitude, bless, brand name, browse, burden, cognitive, constitution, counsellor, extended family, nuclear family, work out, argument, lend an ear, be reconciled with, sympathetic, self-esteem, self-reliant, self-discipline, studious.
|
Relationships
|
Becoming independent
|
12
|
Choosing a career
|
Related vocabulary:
- Hopes, dreams and ambitions after school graduation
- Life stories, cultural identity
- Features of Vietnamese culture
|
Achievement, anonymous, dedication, diagnose, distinguished, humble beginnings, perseverance, respectable, assimilate, cultural practices, diversity, martial spirit, multicultural, national pride, worship, tuition, survival, institution, apprentice…
|
Life stories
|
Cultural identity
|
The ‘our society’ theme decentres individual learners into a larger community through topics about local surroundings, social issues related to the modern world, and the globalisation process. An indicative list of lexical items based on semantic relevance to each topic under the ‘our lives’ theme is summarised in Table 5 below.
Table 5
Indicative lexical items and topics under ‘our society’ theme
Grade
|
Topics
|
Language items
|
Example of suggested vocabulary
|
10
|
For a better community
|
Related vocabulary:
- Volunteers and volunteer work, and priorities for community development
- Inventions at present and in the future
- The role of women and men in modern Vietnamese society
|
Announcement, apply, balance, by chance, community, concerned, creative, disadvantaged, employment, fortunate, handicapped, earbud, economical, headphones, imitate, inspiration, portable, submarine, Velcro, address, challenge, discrimination, effective, equality, favourable, alert, article, bridegroom, engagement, fertiliser, …
|
Inventions
|
Gender and equality
|
11
|
Caring for those in need
|
Related vocabulary:
- Disadvantaged people
- Recommendations to improve accessibility for disabled people
ASEAN countries
|
Accessible, barrier, blind, barrier, care, charm, comfortable, community, control, disability, disapproval, discrimination, healthcare, mobility, support, assistance, association, bloc, economy, in accordance with, interference, legal, motto, principle, progress, state-owned…
|
Vietnam and ASEAN
|
12
|
Urbanisation
|
Related vocabulary:
- Urbanisation
- Different types of mass media and their functions
- International organisations in Vietnam and describe their functions
|
Agricultural, centralise, cost-effective, counter-urbanisation, discrimination, downmarket, energy-saving, industrialisation, kind-hearted, sanitation, addicted, advent, cyberbullying, documentary drama, microblogging, tie in, website.
|
The mass media
|
Vietnam and International Organisations
|
The ‘our environment’ theme covers knowledge on preservation of the natural habitats and the cultural heritage, both in the Vietnamese and global contexts. The topic of ‘cultural diversity’ (for Grade 10) is strongly connected with the topic of ‘cultural identity’ in the previous ‘our lives’ theme (for Grade 12), indicating there is an overlap of topic-based knowledge in the textbooks across grades. An indicative list of lexical items based on semantic relevance to each topic under ‘our environment’ theme is summarised in Table 6 below.
Table 6
Indicative lexical items and topics under ‘our environment’ theme
Grade
|
Topics
|
Language items
|
Example of suggested vocabulary
|
10
|
Preserving the environment
|
Related vocabulary:
- Different cultural groups in Vietnam
- Knowledge of diverse cultural groups
- Pollution and how modern life affect the natural environment
- Eco-tourism
|
Affect, alert, altar, application, best man, bride, engagement, contrast, diversity, global warming, greenhouse effect, honeymoon, horoscope, influence, life partner, mass-media, chemical, confusion, consumption, degraded, depletion, eco-system, preservation, vegetation, assignment, eco-friendly, adapt, software, proposal…
|
Cultural diversity
|
Ecotourism
|
11
|
Global warming
|
Related vocabulary:
- Issues about environmental destruction
- Global warming
- Vietnamese world heritage sites
|
Absorb, atmosphere, carbon footprint, clean-up, doctorate, drought, dynasty, eligible, landscape, archaeological, authentic, boost, catastrophic, college, complex, critical, cruise, diversity, emerge…
|
Our world heritage sites
|
12
|
The Green movement
|
Related vocabulary:
- The endangered species
- Living and non-living things interacting in ecosystem
- A green lifestyle
|
Biomass, bronchitis, conservation, deplete, dispose of, long-lasting, mildew, preservation, purification, replenish, soot, sustainability, resurrect, overwhelming, opportunity, malfunction....
|
Endangered species
|
The ‘our future’ theme contains topics related to the personal, professional development of learners, and the general development of society in future time. Future-oriented topics are introduced mostly for Grade 12 students as they approach closer to high-school post-graduation prospect, indicating an age-appropriateness of the learning content. An indicative list of lexical items based on semantic relevance to each topic under the ‘our future’ theme is summarised in Table 7 below.
Table 7
Indicative lexical items and topics under ‘future’ theme
Grade
|
Topics
|
Language items
|
Example of suggested vocabulary
|
10
|
New ways to learn
|
Related vocabulary:
- How electronic/ electrical devices can help in learning
|
Access, ancestor, concentrate, device, digital, disadvantage, educate, educational, identity, handkerchief, inequality, voice recognition, …
|
11
|
Further education
|
Related vocabulary:
- Higher education
- Health lifestyles and traditional treatments for common illnesses
- Life in the future
|
abroad, academic, analytical, admission, achieve, baccalaureate, collaboration, CV, ecological, institution, liveable, magnificent, profession, subsequence, transcript, overcrowded, …
|
Cities of the futures
|
Healthy lifestyle and longevity
|
12
|
Artificial Intelligence
|
Related vocabulary:
- Lifelong learning
- The benefits and drawbacks of AI
- The world of work
|
Well-spoken, unbelievable, tedious, shortlist, habitat, get to grips with, endangered species, algorithm, workforce, qualification, applicant, ambition, flexibility, career, approachable, administrator…
|
The world of work
|
Lifelong learning
|
Qualitative results identified 4 indicative lists of lexical items from 4 learning themes in 6 English textbooks for Grade 10 – Grade 12 students. With 4 as the pre-defined number of themes, a subsequent LDA analysis on English exam papers was conducted.
English Exam Papers: Testing Through A Mixed Picture
Figure 1 shows a topic modelling analysis on 50 selected English exam papers in 2018 in which the processed corpus of assessment items was categorised into 4 themes, automatically named as themes 1, 2, 3, and 4. With different words related to humanity, human activities (talk, look, progress, friends), or occupation (teacher, apply, learn, …), it is unclear to identify theme 1 as aligned with any of the 4 topics previously detected in the English textbooks. In contrast, there is a clear indication of alignment in thematic vocabulary between themes 2, 3, and 4 with the English curriculum. Specifically, under theme 2, 17 out of the top 20 relevant items such as ‘emission’, ‘destroy’, ‘absorb’, ‘habitat’, ‘forest’, ‘ecosystem’, ‘renewable’, ‘sustainable’ are words related to, and specified in, topics like ‘preserving the environment’, ‘ecotourism’ (Grade 10) and ‘our world heritage sites’ (Grade 11) under the ‘our environment’ theme in the current English curriculum. Similarly, a collection of family-related, and self-related words have been grouped under theme 3 including ‘responsible’, ‘breadwinner’, ‘parent’, ‘burden’, ‘self’, ‘respect’, ‘custom’, which are the lexical items listed for topics ‘family life’ (Grade 10), ‘the generation gap’ (Grade 11), and ‘cultural identity’ (Grade 12) under the ‘our lives’ theme in the English curriculum. As for theme 4, there is also a high occurrence of future-related words in the top 20 relevant items such as ‘technology’, ‘university’, ‘abroad’, ‘lifelong (learning), ‘reward’, and ‘career’, which indicates theme 4 identified in the English exam papers is aligned with the ‘our future’ theme in the textbooks. On the contrary, as for theme 1, the researcher found only 2 relevant items out of 20 (‘progress’, and ‘equality’) as aligned with the ‘our society’ theme. Therefore, the evidence suggests that the 2018 English exam papers correspond to three out of four themes covered by the English textbooks, namely the ‘our environment’, ‘our lives’ and ‘our future’ themes. This finding can be summarised in Table 8 below.
Table 8
4 themes derived from LDA topic modelling, and the most relevant items for the English exam papers in 2018
Automated theme
|
Aligned theme
|
Most relevant word items
|
1
|
Unidentified
|
Help, mean, learn, people, may part, new, good, many, teacher, look, start, talk, friend, progress, differ, access, apply, equality, world
|
2
|
Our environment
|
Emission, life, destroy, preserve, chemical, protect, pollute, adapt, absorb, environment, sustain, ecosystem, gas, earth, habitat, renew, forest, climate
|
3
|
Our lives
|
Self, responsibility, casual, breadwinner, burden, judge, donate, norm, body, children, parent, obey, identity, custom, articulate, initiative, career, pride, respect, stuff
|
4
|
Our future
|
Digital, technology, university, abroad, lifelong, world, time, reward, know, survive, career, intervene, entrepreneur, support, work, long-term, improvement, differ, job
|
Similarly, Fig. 2 shows a theme-based categorisation on 50 selected English exam papers in 2019. Items under theme 2 and theme 4 were identified as broadly consonant with the ‘our environment’ and ‘our future’ themes in the textbooks whilst results from theme 1 and theme 3 remained inconclusive and indicated an unclear pattern of alignment. Most of lexical items under theme 2 demonstrated a close alignment to environment-related topics under the ‘our environment’ theme, such as ‘environment’, ‘green’ (‘preserving the environment’, Grade 10; ‘the Green movement’, Grade 12), ‘reuse’, ‘ecotourism’ (‘ecotourism’, Grade 10), ‘famine’, ‘catastrophe’, ‘habitat’, ‘organic’, ‘renewable’ (‘global warming’, Grade 11). Similarly, under theme 4, high frequency of words such as ‘equality’, ‘modernity’, ‘technology’ demonstrated a content compatibility with such topics as ‘new ways to learn’ (Grade 10), whilst other items like ‘degree’, ‘stress’, ‘health’, ‘tuition’, ‘recruit’, ‘offer’, ‘school’, ‘shortlist’, and ‘attend’ showed its meaning consistency with topics like ‘further education’ (Grade 11), ‘healthy lifestyle and longevity’ (Grade 11), and ‘the world of work’ (Grade 12), all are under ‘our future’ theme. By contrast, theme 1 and theme 2 conveyed mixed results, which did not necessarily fall under any pre-identified theme in the textbooks. Findings suggest that the 2019 English exam papers unaddressed two out of four themes covered by the English textbooks, namely the ‘our environment’ and the ‘our future’ themes, leaving out the ‘our lives’ and the ‘out society’ as the two unaligned themes. This finding can be summarised in Table 9 below.
Table 9
4 themes derived from LDA topic modelling, and the most relevant items for the English exam papers in 2019
Automated theme
|
Aligned theme
|
Most relevant word items
|
1
|
Unidentified
|
Tabloid, support, character, practice, campaign, school, provide, clear, view, head, remote, mean, win, hours, advent, assist, population, profit, interest
|
2
|
Our environment
|
Environment, green, offer, charcoal, reuse, fresh, award, ban, ecotourism, acknowledge, catastrophe, famine, closest, drop, city, habitat, organic, learning
|
3
|
Unidentified
|
Strong, drop, accord, holiday, hard, method, newspaper, regret, press, confer, book, meeting, arab, centre, british, new, certain, many, book, mean
|
4
|
Our future
|
Equality, modern, technology, degree, stress, health, pursue, tuition, quality, workout, recruit, school, shortlist, offer, international, qualify, popular, attend, support
|
Similar to the previous analyses, figure 3 presents a topic modelling analysis on 50 selected English exam papers in 2020, including the most recent official papers held in August 2020. Across theme 1, 3 and 4, findings did not share any distinct similarity with the pre-identified themes in textbooks. However, results from theme 2 demonstrated half of the top 20 relevant items to fall under the ‘our environment’ theme including ‘environment’, ‘reuse’, ‘protect’, ‘renew’, ‘support’, ‘preserve’, ‘green’, ‘greenhouse’, ‘emission’ and ‘development’. This is a remarkable finding as it shows for the third year in a row, the ‘our environment’ theme in the textbooks appeared as the testing focus in the English exam papers. Conclusively, evidence shows that in 2020, the English exam papers seemed to only focus on testing the ‘our environment’ theme whilst there was less emphasis on the remaining three themes. This finding can be summarised in Table 10 below.
Table 10
4 themes derived from LDA topic modelling, and the most relevant items for the English exam papers in 2020
Automated theme
|
Aligned theme
|
Most relevant word items
|
1
|
Unidentified
|
Take, first, start, nature, world, meet, parent, problem, computer, think, moral, social, life, suit, straw, human, envy, make, say
|
2
|
Our environment
|
Environment, reuse, people, differ, build, protect, problem, renew, animal, support, good, help, preserve, need, green, (green)house, emission, development, character
|
3
|
Unidentified
|
Work, see, part, day, develop, play, hours, material, good, human, think, build, money, cause, success, make, text, left, character, use
|
4
|
Unidentified
|
People, use, differ, mean, even, look, text, spider, new, normal, requirement, concern, writer, well, positive, suggestion, quit, become, prize, committee
|