Research into emotion detection has gained substantial traction in the field of computational linguistics over the past two decades (Picard, 1995/2000; Chuang & Wu 2002, Mihalcea & Liu 2006, Ahmad 2008, Strapparava & Mihalcea 2008, Chen et al. 2009, Lee et al. 2009, Lee et al. 2010). Current investigations in text-based emotion detection encompass various domains. One prominent avenue involves the examination of emotions within the context of online social media platforms. This encompasses a wide range of data sources, from theme-based book reviews and movie comments on platforms like Goodreads (Dimitrov et al., 2015) to the unfiltered expression of thoughts and sentiments on platforms such as Twitter and Reddit (Demszky et al., 2020). Another noteworthy direction centers around the analysis of emotions in literary classical works, including fairy tales (Mohammad, 2012), among others. However, it is essential to acknowledge that a substantial portion of research in this domain predominantly relies on monolingual datasets, which poses limitations in understanding the complexities of emotions in bilingual contexts. While some scholars have recognized the importance of incorporating bilingual or multilingual datasets, their approach often involves annotating emotions in a single language and subsequently relying on rudimentary translation methods, such as Google Translate, to render the annotated dataset to other languages (Mohammad & Turney, 2010). This methodology, unavoidably, introduces potential errors and inaccuracies in the resulting translated dataset. Therefore, a more dedicated exploration of the bilingual perspective is crucial to enhance emotion detection in Chinese-English language pairs, analyze emotional nuances in translations and cross-cultural studies, and improve cross-linguistic and cross-cultural analyses of emotions.
Previous research has predominantly employed diverse emotional taxonomies, including Ekman’s (1992) six basic emotions (Joy, Anger, Fear, Sadness, Disgust, and Surprise), Plutchik’s (2003) eight basic emotions (Joy, Sadness, Anger, Fear, Trust, Disgust, Surprise, and Anticipation), Parrott’s (2001) five basic emotions (Joy, Sadness, Anger, Fear, and Love), and the extensive GoEmotions taxonomy with over 27 emotions (Demszky et al., 2020). The emotion taxonomy in this dataset is founded on established research, acknowledging four primary emotions are Happiness, Sadness, Anger, and Fear (Lee, 2015). This study adopts Parrott’s (2001) classification, which includes Love except the four primary ones, aligning with the focus on children’s literature. This choice is underpinned by two primary considerations: firstly, Love serves as a cornerstone in the emotional development of children, aiding in the cultivation of empathy and social cohesion (Shaver et al., 1996); secondly, Love consistently emerges as a prevalent theme within children’s literature, enriching the analysis of emotions and contributing to the fostering of positive emotional attitudes among young readers.
For the trends and application models in the identification and analyses of emotions in languages, there are mainly rule-based approach, machine learning-based approach and deep learning approach. In the rule-based emotion analysis, the use of emotion-bearing words and their combinations to assess phrasal units for emotions has been a primary focus of emotion analysis research for a long time (Aman & Szpakowicz, 2007, Chen et al, 2009, Lee et al., 2013). Popular emotion lexicons includes NRC Lexicon (Mohammad & Turney, 2010; Mohammad & Turney, 2013), ANEW (Bradley & Lang, 1999; Nielsen, 2011) and the Valence Arousal Dominance Lexicon (Mohammad, 2018). The machine learning-based approach to emotion analysis entails converting text emotion analysis into a classification task. This involves the application of established algorithms like Support Vector Machines (SVMs), Naive Bayes, Logistic Regression, and other machine learning methods (Aman & Szpakowicz, 2007; Danisman & Alpkocak, 2008; Deshpande & Rao, 2017). Although bag-of-words models have demonstrated promise in the domains of speech emotion recognition (Jain et al., 2020; Kwon et al., 2003) and facial emotion detection (Michel & El Kaliouby, 2003; Susskind et al., 2007), there exists considerable potential for refinement within the context of text-based emotion analysis in terms of sparse data features. In recent years, deep learning approach, such as Convolutional Neural Networks (CNNs), Long Short-Term Memory Networks (LSTMs), and Recurrent Neural Networks (RNNs), have gained significant prominence in the field of text emotion analysis (Zhou & Long, 2018). These approaches have garnered attention due to their demonstrated ability to address the inherent limitations associated with traditional machine learning techniques. More recent advancements in transformer-based models, such as BERT and the GPT family, which incorporate language model pre-training, have demonstrated significantly improved performance (Guu et al., 2020).
Scholarly interest in the analysis of emotions in children’s literature emerged as early as the late twentieth century, as evidenced by Stevenson’s work in 1997. Previous research in this field has primarily focused on several key areas, including the detection and recognition of emotions within children’s literary texts, as explored by Alm et al. (2005) and more recently by Zad et al. (2021). Additionally, scholars have examined the socio-cultural implications of emotions depicted in children’s literature, as demonstrated by Adukia et al.’s research in 2022, and have conducted psychological examinations of hypotheses related to emotions, exemplified by Jacobs et al.’s work in 2020.
The methodologies employed in the analysis of emotions in children’s literature share commonalities with the broader field of emotion analysis. Researchers have employed both rule-based and machine learning-based approaches to investigate emotions in literary works. For instance, Saif Mohammad’s studies in 2010 and 2013 utilized these approaches to scrutinize emotions in novels and fairy tales. His research highlighted the potential of sentiment analysis when combined with effective visualization techniques, enabling the quantification and monitoring of emotions within individual books and extensive literary collections. Similarly, Alm et al. (2005) adopted a supervised machine learning approach, utilizing the SNoW learning architecture, to explore text-based sentiment prediction. Their investigation provided valuable empirical insights into this facet of sentiment analysis. Furthermore, Jacobs et al. (2020) directed their research toward the Pollyanna Effect, employing the model-based unsupervised vector space sentiment analysis tool known as SentiArt. This approach allowed for a nuanced exploration of sentiment dynamics within literary texts, making a significant contribution to the broader discourse on emotion analysis in literature.
However, it is worth noting that recent advancements in transformer-based models, such as BERT and the GPT family, have yet to be applied to emotion analysis in children’s literature. To address this gap in the literature, the current study aims to leverage BERT and GPT family models in conjunction with high-quality data annotated with emotions. This approach seeks to enhance our understanding of emotional content within children’s literature, drawing upon the latest developments in natural language processing techniques.