Adapter-based fine-tuning of pre-trained multilingual language models for code-mixed and code-switched text classification

Code-mixing and code-switching are frequent features in online conversations. Classification of such text is challenging if one of the languages is low-resourced. Fine-tuning pre-trained multilingual language models is a promising avenue for code-mixed text classification. In this paper, we explore adapter-based fine-tuning of PMLMs for CMCS text classification. We introduce sequential and parallel stacking of adapters, continuous fine-tuning of adapters, and training adapters without freezing the original model as novel techniques with respect to single-task CMCS text classification. We also present a newly annotated dataset for the classification of Sinhala–English code-mixed and code-switched text data, where Sinhala is a low-resourced language. Our dataset of 10000 user comments has been manually annotated for five classification tasks: sentiment analysis, humor detection, hate speech detection, language identification, and aspect identification, thus making it the first publicly available Sinhala–English CMCS dataset with the largest number of task annotation types. In addition to this dataset, we also tested our proposed techniques on Kannada–English and Hindi–English datasets. These experiments confirm that our adapter-based PMLM fine-tuning techniques outperform or are on par with the basic fine-tuning of PMLM models.

Social media language variations of Sinhala-English CMCS data (The same sentence is written in different CMCS forms.)

Introduction
The embedding of linguistic components such as words, phrases, and morphemes from one language into an utterance from another language is referred to as code-mixing [13]. In simple terms, code-mixing is the practice of borrowing words from one language and adapting them to another without affecting the topic. The juxtaposition of two grammatical systems or subsystems within the same conversation is referred to as code-switching [13]. An example of Sinhala-English code-mixed and code-switched data is given in Table 1.
Natural language processing (NLP) faces a significant challenge when dealing with CMCS data, as tools developed for a single language might underperform in the context of CMCS data. Handling of code-mixed and code-switched (CMCS) data is difficult because of the lack of annotated datasets, a significant number of unobserved constructions created by combining the syntax and lexicon of two or more languages, and a large number of possible CMCS combinations [45]. This problem is exacerbated in the context of low-resource languages (LRLs), where datasets are known to be even more scarce and NLP tools are sub-optimal.
Pre-trained multilingual language models (PMLMs) such as mBERT [24] and XLM-R [9] have attained state-of-the-art performance in most of the text classification tasks [9], including code-mixed data classification [23]. In previous CMCS text classification research, the PMLMs were mostly used with some basic fine-tuning and hyperparameter optimization [3,17].
In this paper, we apply the adapter-based PMLM fine-tuning strategy for CMSC text classification. Adapters are neural modules that add a small number of new parameters to a model [30]. During the training phase, the original model parameters are frozen, and only the newly introduced adapter parameters are fine-tuned. Adapters can be either task-specific or language-specific: language adapters are trained to learn language-specific representations, whereas task adapters are trained to learn task-specific representations.
Unlike previous research that used adapters for text classification [12,31,44], we are focusing on CMCS data, which is a mix of multiple languages. Thus, we use different combinations and fine-tuning strategies of both language and task adapters. Specifically, we introduce three ways of using adapters in CMCS data classification: (1) sequential and parallel stacking of language adapters followed by a single task adapter, (2) continuous fine-tuning of task adapters on different pre-trained language models (PLMs), and (3) training task adapters without freezing the original model parameters.
Our solutions are validated on three datasets: a Sinhala-English CMCS dataset newly compiled by us, as well as Kannada-English [14] and Hindi-English 1 CMCS datasets, which are publicly available. Our Sinhala-English dataset has been annotated for sentiment, humor, hate speech, language ID, and aspect classification, considering all the CMCS phenomena given in Table 1. Thus, this dataset can be used for five classification tasks, which makes this dataset support the largest number of classification tasks, compared to the other CMCS datasets presented in previous work (given in Table 2). In this research, we experimented with this dataset for the first four tasks. Further, we experimented with sentiment analysis and hate speech detection tasks in the Kannada-English dataset, as well as humor detection and language identification in the Hindi-English dataset.
Experiment results on the XLM-R PMLM show that our adapter-based fine-tuning strategies either outperform or are on par with basic fine-tuning for all the datasets we used. Our third approach yielded the best results on average across all experiments. Given that Sinhala and Kannada are heavily under-represented in XLM-R (meaning that a relatively smaller dataset was used for these languages during XLM-R fine-tuning), its performance on Sinhala and Kannada CMCS data is impressive-we believe that this is due to the XLM-R model being able to learn a strong cross-lingual signal from the CMCS data, which can further strengthen up representation of low-resource languages. Thus, such PMLMs should be considered a viable option even for LRLs such as Sinhala and Kannada. This paper makes the following contributions: 1. We presented three adapter-based fine-tuning strategies on PMLMs for CMCS text classification. 2. A Sinhala-English CMCS dataset annotated for five different tasks. Compared to the existing datasets, this CMCS dataset has the largest number of task annotations. To the best of our knowledge, this is also the first annotated dataset with Sinhala-English CMCS humor and hate speech classifications. 3. This is the first systematic study of the classification of Sinhala-English CMCS text.
Our dataset, code, and the trained models are publicly available 2,3 .

CMCS text classification
For classifying CMCS data, machine learning approaches such as logistic regression, support vector machines, multinomial naive Bayes, K-nearest neighbors, decision trees, and random forest have been used by early research [6]. Later, deep learning techniques such as CNN, LSTM, and BiLSTM have been widely used [19]. Most recently, fine-tuning pre-trained monolingual models such as BERT and multilingual models such as XLM-R and mBERT have been used [15]. Some studies showed that PLMs outperformed other deep learning and machine learning techniques [1,5,23], while the opposite was reported in some other research [20]. However, Kazhuparambil and Kaushik [20] state that PLMs can be made the top-performing models for CMCS data classification by optimizing hyper-parameters.

Annotated corpora for CMCS text classification
CMCS can appear in various forms, including code-switching, inter-sentential, and intrasentential code-mixing, and texts written in both Latin and native scripts. CMCS text classification corpora have been mainly created with respect to Indian languages such as Hindi-English [4], Telugu-English [13], Tamil-English, Kannada-English, and Malayalam-English [6], while there are some corpora for other languages such as Sinhala-English [37], Spanish-English [43] and Arabic-English [35]. Except for Chakravarthi et al. [6]'s dataset, others have removed the text written in native script and considered only a subset of codemixing levels. Moreover, most of the studies have annotated their datasets for limited types of classification tasks as given in Table 2.
In the literature, positive, negative, mixed, neutral, and 'not in intended language' tags have generally been used for CMCS Sentiment Analysis [6], while the CMCS humor detection task uses a binary tag scheme to indicate whether a text is humorous or non-humorous [22]. For CMCS hate speech detection, some studies used a binary tag scheme [4], while others used a tag scheme containing hate (or hate-inducing), abusive, and not offensive tags [25]. Some studies [14] also included additional tags for these tag schemes based on the targeted group. On the other hand, aspects are always domain-specific [29]. For CMCS language identification, most research used a tag scheme with tags for corresponding two languages with few other tags to represent Named Entities, URLs, and punctuation marks [2]. But none of them considered separate tags to identify the hybrid mixing 4 of two languages.

Adapter-based fine-tuning of PLMs
While fine-tuning PLMs is widely used in NLP recently, fully fine-tuning those models for a specific task is time-consuming as millions, if not billions, of parameters must be learned. Sharing and storing those models is equally challenging. To address these concerns, "Adapters" [16] were introduced as a parameter-efficient fine-tuning strategy. Adapters are small learned bottleneck layers that are inserted within each layer of a PLM and are updated during fine-tuning, while the rest of the model remains fixed. It has been shown that adapter-based fine-tuning achieves performance comparable to fully fine-tuning on many classification tasks [30]. However, adapters have resulted in a minor performance decrease for some tasks [30]. There are two popular adapter architectures: Houlsby Adapter [16] and Pfeiffer Adapter [31]. Both types of adapters are implemented on the Transformer architecture [42]. Houlsby adapters have two down and up-projections within each transformer layer, whereas the Pfeiffer adapter has one down-and up-projection. Figure 1 visualizes the transformer layers with these adapter layers in comparison to standard transformer layers. However, Pfeiffer et al. [31] showed that there is no significant difference in performance between the model architectures.
Adapters can be further classified into two types: task adapters and language adapters. Task adapters [16] are trained to learn a specific task representation. Language adapters [31] are trained to learn a specific language representation. Language adapters, unlike task adapters, are usually not used alone. When training on a downstream task, the task adapter is stacked on top of the source language adapter for cross-lingual transfer learning. The source language adapter is replaced with the target language adapter during inference time to achieve zero-shot cross-lingual transfer capabilities [31].   [30,33] Also, there is a possibility to combine multiple adapters using stacking (sequential) and parallel composition blocks 5 . Figures 2 and 3 visualize these stacking and parallel adapter compositions, respectively. To facilitate the cross-lingual transfer of PMLMs, Pfeiffer et al. [31] used a stacking adapter composition where a task adapter was stacked on top of a language adapter. Wang et al. [44] stacked an ensemble of multiple related language adapters with a task adapter. Parallel processing of adapters was first used by Rücklé et al. [34]. AdapterHub [30] is an open-source 6 , easy-to-use, and extensible adapter training and sharing framework that supports both of these adapter types as well as different adapter architectures.

Vanilla fine-tuning of PMLMs
The most successful PMLMs are trained on the Transformer architecture [42]. Encoder-based ones, such as mBERT and XLM-R, are commonly used for classification. These have been  [41] trained with a variety of languages and self-supervised objectives such as masked language modeling. As a result, they have to be fine-tuned separately for the downstream task. Vanilla Fine-tuning, also known as basic fine-tuning, is the most common method for training them for downstream tasks. During fine-tuning, PMLM weights are copied and fine-tuned with task-specific data to learn the task representation.

Basic adapter-based fine-tuning
For a downstream task, a task adapter introduces new and randomly initialized adapter parameters in addition to the initial parameters of the PLM. During fine-tuning, the newly introduced adapter parameters are trained, while keeping the original PLM parameters fixed to learn the specific task representation. Since we are implementing multiple classification tasks, we trained separate task adapters on XLM-R to learn each classification task representation. It was decided to train adapters with both Pfeiffer and Houlsby configurations at the beginning, and continue with the best-performed adapter configuration for further experiments.

Stacking language adapters
As mentioned in Sect. 2.3, usually language adapters are used for cross-lingual knowledge transfer and used to work with one language at a time, so there is no need for multiple language adapters. The exception is Wang et al. [44], who used an ensemble of related language adapters to adapt a PMLM to languages unseen by the model. However, CMCS data contain multiple languages. Thus, as our first technique, we stack language adapters corresponding to the languages that are present in our datasets (which will be referred to as contributing language adapters in this paper), followed by a task adapter for the corresponding classification task.
In contrast to previous works [31,44], we experimented with stacking multiple contributing language adapters in two different ways: • Sequential stacking (Fig. 4)-We stacked multiple language adapters sequentially, which means stacking one language adapter on top of the other to learn representation specific to each language included in CMCS data. This stack is followed by a task adapter to learn the classification task. • Parallel stacking (Fig. 5)-We used a parallel setup for multiple language adapters. This parallel stack is followed by a task adapter. For that, we used the parallel adapter composition introduced by Rücklé et al. [34]. While Rücklé et al. [34] used this technique for parallel multi-task inference, we use the same idea to enable parallel inference of languages.
AdapterHub 10 has pre-trained language adapters in about 50 languages that can be applied to any downstream task to capture language-specific representation. To continue with these experiments, first, we selected available pre-trained language adapters from AdapterHub for the languages relevant to our experiment. For the languages not available in AdapterHub, we trained new adapters (see Sect. 5).

Continuous fine-tuning
Usually, a task adapter is trained on a single PLM. But in this experiment, we used CMCS data that contains multiple languages. Therefore we trained a task adapter on multiple PLMs specialized in each language included in the CMCS data. To train the adapters, first, we used two PLMs: PLM specialized in language 1 (Lang 1 PLM, e.g., SinBERT [11] for the Sinhala language) and PLM specialized in language 2 (Lang 2 PLM, e.g., BERT for the English Language). Next, we trained this task adapter on a PMLM specialized in both languages (e.g., XLM-R which is pre-trained in both Sinhala and English languages).
We tried out different training orders of PLMs and with different combinations as shown in Figs. 6 and 7.

Training adapters without freezing the PMLM (Adapters + PMLM training)
As mentioned in Sect. 3.2, the usual practice is to train adapters without training the original model parameters (freezing the PMLM). Recently, Friedman et al. [12] jointly trained the PLM with the adapters in the context of multi-task learning. When the model no longer improves the validation accuracy (which is used to compare model performance), the PLM model parameters are frozen and adapter training is continued. Here, we adapt their solution into a single task adapter. We trained all of the parameters, which include both the original model parameters and the newly introduced parameters by the adapters. In our case, we used macro-F1 score instead of accuracy to compare model performance as it gives a better interpretation of results in the context of imbalanced datasets as explained in Sect. 7.2. We also experimented with combining each of the aforementioned techniques.

CMCS datasets
This section presents the three CMCS datasets used in this study. The tags used in each dataset are listed in Table 3.

Kannada-English dataset
This dataset has been annotated at the sentence level for sentiment analysis and hate-speech detection [14]. This dataset contains all types of Kannada and English CMCS variations. Hande et al. [14] reported results for both single-task and multi-task learning with different PLMs, including PMLMs.

Hindi-English dataset
This dataset has been annotated at both sentence level and word level; humor detection at sentence level as a binary classification and language identification at word level with three different tags 11 . This dataset consists of Hindi-English CMCS data that have been written in Latin script.

The annotated Sinhala-English CMCS corpus
This corpus is newly compiled by us.

Data collection and pre-processing
A raw data set that consists of 465314 social media comments were obtained from previous research [7]. First, 15000 comments were randomly selected and Tamil 12 mixed comments were filtered out manually since there were only about 0.02% Tamil mixed comments. After ignoring noisy data instances such as one-character comments and integer comments, 10,000 comments were randomly selected for the annotation.
Since the data have been extracted from social media comments, it contains identities such as names of persons, names of organizations, and contact numbers. To obey the ethics and rights, an anonymization scheme was designed to perform on the dataset before making it available to the public. The proposed anonymization scheme is given in Table 4.   Positive When the commentator is hopeful or confident and focuses on the positive aspect of a situation rather than the negative aspect.
I enjoyed it a lot.

Negative
When the commentator is pessimistic about a situation or experience that is unpleasant or depressing.

Slow network
Neutral When the comment is lacking in sentiment or the commentator does not express a thing as good or bad.
Please send your contact number

Conflict
When the commentator uses the same comment to describe something as good and something as bad.
Followed the annotation scheme proposed by Senevirathne et al. [36]

Data annotation
The selected 10,000 comments were annotated according to the tagging schemes given in Tables 5, 6 The dataset was annotated by four annotators. To evaluate the agreement among annotators, Fleiss Kappa was calculated for single label tags, while the multi-label tagging of

Congratulations our boys
Followed the annotation schema proposed by Chathuranga and Ranathunga [7], which uses data particular to the telecommunication domain Used an extended version of the annotation scheme proposed by Smith and Thayasivam [37]. The last three tags were introduced by us to identify the hybrid mixing of two languages  Table 10. All the values are above 0.6, interpreting that there exists a substantial inter-annotator agreement between the annotators.

Dataset statistics
Tag set distribution shown in Figs. 8, 9, 10, 11, and 12 indicates the dataset is imbalanced. We used some techniques to handle this imbalance, which is described in Sect. 7. The Code-Mixing Index (CMI) proposed by Das and Gambäck [10] is used to measure the level of mixing between Sinhala and English languages in the created dataset. Our dataset received a value of 11.52 for "CMI-All" (considering all sentences) and a value of 23.77 for "CMI-CS" (only considering code-switched sentences). The calculation for the CMI is given in Appendix A.
A higher CMI value indicates a higher level of mixing between the languages, whereas CMI = 0 indicates no code-mixing. A comparison of the CMI of our dataset with other related     datasets in the LinCE benchmark [1] is given in Table 11 and according to the comparison, our dataset has a significant level of code-mixing.

Adapter training
• Training Task Adapters. A task adapter was trained for each classification task, using the dataset. For example, using the Sinhala-English dataset, we trained task adapters for all four classification tasks: sentiment analysis, humor detection, hate speech detection, and language identification. The same was done for other CMCS datasets. • Training language adapters -We trained two language adapters, one for Sinhala (Si) using Senevirathne et al. [36]'s Sinhala dataset and the other for Sinhala-English CMCS (Si-En) using our newly created Sinhala-English CMCS training dataset. As further explained in Sect. 7.5, language adapters were not used in experiments on Kannada-English and Hindi-English CMCS datasets.
• Using pre-trained language adapters -We used a pre-trained English language adapter (En) trained for XLM-R, which is available in AdapterHub 13 .
• Using PLMs for continuous fine-tuning -For Sinhala-English, we used English BERT [21] and SinBERT (a pre-trained RoBERTA model for Sinhala) with XLM-R. -For Kannada-English and Hindi-English CMCS datasets, we used English BERT [21], IndicBERT [18], and XLM-R. IndicBERT has been trained in 12 Indian languages, including Hindi and Kannada. IndicBERT was used because there is no specific PLM for Hindi or Kannada that supports adapters 14 .

Baseline implementation
Long short-term memory (LSTM), bi-directional long short-term memory (BiLSTM), and Capsule Networks proposed by Senevirathne et al. [36] are used as the baseline models for sentiment analysis task because they reported the best results for Sinhala sentiment analysis. The same baselines are used for hate speech detection and humor detection tasks for Sinhala-English CMCS. However, for Sinhala-English (and Hindi-English) language identification, a two-layer bi-directional LSTM model proposed by Toftrup et al. [40] was used as the baseline. The same LSTM and BiLSTM models were applied to Kannada-English and Hindi-English classification tasks. LSTM: The model comprises an input layer, an embedding layer, two dropout layers to prevent overfitting, an LSTM layer, and two solid layers with the softmax function to predict the relevant label in a given sentence.
BiLSTM: The input layer, the embedding layer, the bidirectional LSTM layer, the timedistributed dense layer, the flatten layer, and the dense layer with the softmax activation function are the basic components of the BiLSTM model. Two-layer BiLSTM: First, all the characters in the input string are replaced by vector embedding. In each subsequent step, LSTM obtains the character embedding and hidden layer representation. From left to right, the output of one character in the LSTM layer is combined with the layer from right to left. The concatenated vector is similar to the first LSTM layer but does not share the parameters. After that, the concatenated vectors pass through a single linear layer, producing a distribution over all the supported languages.
Capsule Network: The three main capsule layers, which are initiated with a convolutional layer, are the main components of this model. Each capsule in the model is instantiated with 16-dimensional parameters. Also, each capsule layer has 16 filters.
XLM-R: XLM-R model is a transformer-based multilingual masked language model that has been pre-trained on text in 100 languages including Sinhala, Kannada, Hindi, and English, and has reported state-of-the-art results for cross-lingual classification [9]. In this research, the XLM-R model taken from HuggingFace 15 was initialized with a sequence classification head. Then, the model was separately fine-tuned with the CMCS dataset for each classification task.

Data preparation
To overcome the issue of dataset imbalance, random oversampling (ROS) and synthetic minority oversampling technique (SMOTE) [8] with different sampling ratios were explored as a pre-processing step. In this paper, we present the results of the best oversampling technique. Note that these oversampling techniques have been performed only on Sinhala-English, as the other two datasets are free of the data imbalance issue. Table 12 contains the hyper-parameters of the models; other hyperparameters are default values. For LSTM, BiLSTM, and capsule network models, fastText word embedding with 300 dimensions is used as the embedding layer, and categorical cross-entropy is used as the loss function. In those models, results are interpreted using fivefold cross-validation. For interpreting the results of XLM-R and adapters, the experiment was run five times with different random states and the average was taken. We also used early stopping in these experiments. Precision, recall, and F1-score are given in macro-averages as they give equal weights for each class and give correct interpretation of results in the context of imbalanced data. All the experiments in this paper are carried out using the Google Colab environment 16 .

Results
The results of experiments carried out are given in Tables 13 and 14.

Baseline results analysis
According to the results shown in Tables 13 and 14, the fine-tuned XLM-R has outperformed the other Deep Learning models used in each task. This shows the XLM-R's ability to recognize cross-lingual relationships in CMCS text classification. When it comes to oversampling   Pr. Re.
Re.  Stacking language adapters with task adapters outperformed basic adapter-based finetuning in most of the experiments. We experimented with the sequential and parallel stacking of various language adapter combinations, such as stacking a single language adapter alone, stacking two language adapters, and stacking all three language adapters in different orders with the task adapter. Among them, En+Si+Si-En performed best for Sinhala-English CMCS. That may be because those language adapters were able to add language knowledge of Sinhala, English, and Sinhala-English CMCS to the model. Furthermore, parallel stacking of language adapters performed better than sequential stacking. This could be because parallel stacking allows for parallel inference of languages present in CMCS data. Despite outperforming basic adapter-based fine-tuning, this technique outperforms XLM-R only in the hate speech task. Therefore, we did not apply this technique to Kannada-English and Hindi-*English CMCS classifications.
Our continuous fine-tuning approach improved the results of some tasks while providing on-par performance to XLM-R in others. For Sinhala-English CMCS classification, we obtained the best results for continuous fine-tuning in the order of BERT, SinBERT, and XLM-R. This technique, however, did not perform well with Hindi-English and Kannada-English datasets. That could be because we could not find a PLM trained specifically for Hindi or Kannada with an MLM objective.
Adapter fine-tuning without freezing the model gave us the best results in all four tasks for all Sinhala-English, Kannada-English, and Hindi-English classifications. This could be because the model can learn both model parameters and adapter parameters with this technique, allowing it to capture more knowledge about the CMCS classification. But freezing the model and further training the adapters only did not improve our results, in contrast to Friedman et al. [12].
Finally, combining the aforementioned techniques for Sinhala-English CMCS data did not further improve the results. Therefore, we did not test them with Kannada-English or Hindi-English data.

Misclassified Sinhala-English CMCS text
We carried out an error analysis on the Sinhala-English dataset. In the sentiment analysis task, each sentence is classified into four different classes. The sentences that are intended to be classified into the 'conflict' class contain both positive and negative sentiments in a single sentence. Moreover, there were only a few data samples for this class. Also, some sentences carry a negative or positive sentiment even though those sentences do not explicitly contain positive or negative polarity words that the models learn from. This may have made it difficult for the model to figure out the positivity or the negativity of a sentence. Therefore, even the best performing model predicted these types of sentences inaccurately. Some examples are shown in Table 15.
Detecting humor and hate speech is a challenging task since it requires a large amount of external knowledge, such as language and common sense insights. With the small number of samples for the positive classes such as humorous, abusive, and hate-inducing (even the random oversampling only duplicates the already existing examples) the dataset covers only a small amount of those insights. Therefore, even the best-performing model was unable to detect humor and hate speech correctly in some sentences as seen in Tables 16 and 17. There are many ambiguous words when it comes to Sinhala-English CMCS. Some of the words are present in both languages; however, the meaning of each word varies greatly between the two. In particular, when typing Sinhala, people tend to use the characters "k" and "i" at the end of the numbers. This results in misclassified words, as shown in Table 18.

Conclusion and future work
In this research, we experimented with the recently introduced lightweight fine-tuning strategy, i.e., adapters in different ways with PLMs and PMLMs. Our results showed that XLM-R basic fine-tuning outperformed the other deep learning techniques in CMCS data classification and XLM-R is a viable option for low-resource languages. Our study also shows that CMCS text classifications can benefit from stacking contributing language adapters with task adapters because language adapters can add multilingual knowledge to the model. Proposed adapter-based fine-tuning strategies improve the results of XLM-R basic fine-tuning, and training adapters without freezing the model produced the best results for CMCS data. A comprehensive dataset annotated with the sentiment, humor, hate speech, aspect, and language id on Sinhala-English CMCS data is introduced.
We intend to apply different improvements to XLM-R and adapters and develop a multitask model for classifying all the tasks for a given dataset in the future. We believe that our newly created dataset and research findings will be useful in future CMCS text classification research.
Ethical approval Ethical approval is not applicable. Data annotators were informed about the task prior to the work and were paid according to the institution approved rates.

Consent for publication
We would like to confirm that this work is original and has not been published elsewhere, nor is it currently under consideration for publication elsewhere. All the authors give their consent to publish the manuscript.
Himashi Rathnayake is a final year undergraduate student in the Department of Computer Science and Engineering at the University of Moratuwa, Sri Lanka. Her research interests include Natural Language Processing, Machine Learning, and Computer Vision.
Janani Sumanapala is a fourth year undergraduate student studying in the Department of Computer Science and Engineering in University of Moratuwa, Sri Lanka. She is currently studying in the final semester, while working as a software engineer at DirectFN Pvt. Ltd. She has completed her undergraduate research project related to Natural Language Processing.
Raveesha Rukshani is a final year Computer Science and Engineering undergraduate at the University of Moratuwa, Sri Lanka. Her research activities are currently focused on Sinhala-English code-mixed and code-switched data classification which is based on Natural Language Processing. In addition to that she is interested in Machine Learning, and Web and Mobile Application Development.

Surangika Ranathunga received her Bsc in Engineering (Hons) and
MSc in Computer Science from University of Moratuwa, Sri Lanka. She received her PhD from University of Otago, New Zealand. She is currently a senior lecturer at the Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. Her research interests include Natural Language Processing and Machine Learning.