Mitigate Gender Bias Using Negative Multi-task Learning

Deep learning models have showcased remarkable performances in natural language processing tasks. While much attention has been paid to improvements in utility, privacy leakage and social bias are two major concerns arising in trained models. In this paper, we address both privacy protection and gender bias mitigation in classification models simultaneously. We first introduce a selective privacy-preserving method that obscures individuals’ sensitive information by adding noise to word embeddings. Then, we propose a negative multi-task learning framework to mitigate gender bias, which involves a main task and a gender prediction task. The main task employs a positive loss constraint for utility assurance, while the gender prediction task utilizes a negative loss constraint to remove gender-specific features. We have analyzed four existing word embeddings and evaluated them for sentiment analysis and medical text classification tasks within the proposed negative multi-task learning framework. For instances, RoBERTa achieves the best performance with an average accuracy of 95% for both negative and positive sentiment, with 1.1 disparity score and 1.6 disparity score respectively, and GloVe achieves the best average accuracy of 96.42% with a 0.28 disparity score for the medical task. Our experimental results indicate that our negative multi-task learning framework can effectively mitigate gender bias while maintaining model utility for both sentiment analysis and medical text classification.


Introduction
Recent developments in Natural Language Processing (NLP) have made significant success with enormous text data.However, social biases like racism and sexism may exist in the text data, classifiers that are trained and evaluated on these data will cause unfairness.Sentiment analysis aims to identify opinions and sentiments in customer feedback on products and services expressed in social media or review sites [1].It is widely used within marketing and customer relations management.However, sentiment analysis algorithms may perform differently for males and females, potentially misleading decision makings [2].[3] and [4] have demonstrated that word embeddings which are trained on human-generated text data encode human biases in vector spaces.For example, the word "programmer" is neutral to gender by its definition, but models usually associate the word "programmer" closer with "male" than "female" [5].Such biases will affect downstream applications.Models trained from the source data not only encode but even amplify the bias present in dataset [6].
Another major concern is the handling of sensitive information during the training and testing phases of a model.Without privacy-preserving methods, some individuals' information might be leaked from a model learned on training data, such as gender, race, and age.For instance, the sensitive information leakage can occur using model inversion attacks [7].In such scenarios, an attacker, armed with some knowledge of the model's structure and access to its outputs, could potentially reconstruct aspects of the original training data.In other words, they could infer sensitive attributes about individuals in the dataset, such as gender, race, and age, based on the model's responses to certain inputs.In the context of medical data, these concerns become particularly critical.Models trained on patient records can potentially contain and expose information about a patient's disease status or other sensitive health-related data [8].Given the impact and implications of such breaches, it is essential to ensure that models used in clinical diagnostics are not biased towards data from certain demographic groups, as such biases can significantly influence the efficacy and fairness of the models.[9].Therefore, the need for privacy-preserving and bias-mitigation methodologies in such sensitive contexts cannot be overstated.
To address these problems, we propose a negative multi-task learning framework to mitigate gender bias while keeping model utility.Traditional multi-task learning frameworks jointly train several related tasks to improve their generalization performance by leveraging shared knowledge among them.We apply positive loss weights for the main classification task to ensure utility and apply negative loss weights for the gender prediction task to remove gender-specific features.In order to evaluate the gender bias of classifiers, we use disparity scores to measure the difference of accuracy between males and females.Additionally, we propose a selective privacy-preserving method to protect individuals' sensitive information.In Sect.4, we conduct experiments on sentiment analysis and medical text classification.We quantify, analyze and mitigate the gender bias in these two tasks.For sentiment analysis, we only apply the negative multi-task learning framework, the disparity score between males and females drops greatly compared with baseline models.For medical text classification, we first apply the selective privacy-preserving method on sensitive word embeddings, and then use the negative multi-task learning framework to train the model.Our experimental results show that both the privacy-preserving and the negative multi-task learning framework can reduce the disparity score.
The contributions of this paper are as follows: • We propose a negative multi-task learning framework to mitigate gender bias for text classification models.• We selectively explore sensitive information in text embeddings, and then perturb the information of each individual in the data.• In order to quantitatively measure the gender bias, we propose disparity score to calculate the difference in the model's accuracy between males and females.
• We evaluate the proposed negative multi-task learning framework on sentiment analysis and medical text classification tasks.We also protect sensitive information in medical data and provide a comprehensive analysis of our experimental results.

Gender Bias Mitigation Methods
Text corpora used to train NLP models may contain gender, racial and religious biases.
Consequently, word embeddings trained on these datasets maintain such biases.Gender bias is the most common bias which exists in many NLP applications, and numerous studies have highlighted its presence in various NLP tasks [10][11][12].[5] proposes a novel training procedure for learning gender-neutral word embeddings.They generate a Gender-Neutral variant of GloVe (GN-GloVe), which tries to remove socially-biased information in certain dimensions while ensuring that other dimensions are free from this gender effect.Importantly, biases are not only contained in text data and embeddings, they may also exist in learned models even if the data itself is not biased.[13] conducted a comprehensive benchmarking of multiple NLP models, evaluating both their fairness and predictive performance across a range of NLP tasks.Their work primarily targeted the debiasing of embeddings, not classifiers.In contrast, our study is particularly focused on debiasing classifiers.[14] have investigated gender biases in the machine translation model, demonstrating that social gender assignment indeed impacts translation choices.[15] investigates how excising a small portion of the training corpus impacts the resulting bias.They perturb the training corpus to identify factors that most significantly influence embedding bias and subsequently remove them from the training corpus.In this paper, we explore the potential of the multi-task learning framework for mitigating gender bias in text classification tasks.The existing multi-task learning frameworks aim to enhance performance across several tasks concurrently.However, in this paper, we use gender prediction as an auxiliary task and apply negative loss weights to reduce the gender influence for the main task.The details are described in Sect.3.2.

Privacy-Preserving Text Embeddings
Text embeddings are distributed representations of text within an n-dimensional space.Generative models such as Word2vec [16] and GloVe [17] can effectively learn word embeddings from large text corpora, thus capturing a wealth semantic relatedness between words.These word embeddings are subsequently used to address most NLP problems.However, recent research has shown that it is possible for attackers to infer information about their training data through learned models [18].[19] proves that private information can be reconstructed using only text embeddings.
Numerous methods have been proposed to protect individuals' sensitive information.Differential privacy (DP) is a mathematical definition of privacy that provides a guarantee between privacy and utility.DP usually employs noise injection as a primary technique for data anonymization.Additionally, DP incorporates other methods for maintaining privacy during machine learning model training [20].For instance, a Gaussian sanitizer introduces Gaussian noise to gradients during training, concealing specific details about individual training examples.Moreover, an amortized moments accountant monitors the overall consumption of the privacy budget during training, ensuring that privacy expenditure is kept within acceptable limits.These mechanisms collectively fortify privacy preservation in DP.However, we often have to sacrifice some utility accuracy to ensure privacy using DP.
Another previous work is using an adversarial training objective to minimize the risk of adversarial attacks on sensitive information.It can explicitly obscure users' private information [21].[22] proposes a selective differential privacy method to provide privacy guarantees on the sensitive portion of the data for language model utility.
As there is no universal metric for how privacy should be protected, in most situations, the common information doesn't need to be protected.Thus, we mainly focus on the sensitive information of each individual in this paper.We first detect the sensitive words and then introduce noise to corresponding word embeddings to obscure them, thereby protecting the original sensitive words.The details are described in Sect.3.3.

Multi-task Learning Framework
Supposing there are T tasks, multi-task learning frameworks aim to solve these tasks simultaneously.These frameworks typically contain two sets of parameters: shared parameters θ and task-specific parameters {ψ t } T t=1 .In this paper, the shared layers comprise three dense layers, each with 128 units.A global max pooling operation follows the second layer and a Dropout layer with a rate of 0.5 follows the third layer.As depicted in Fig. 1, our multi-task learning framework comprises two tasks: a main task and a gender prediction task.

Shared Feature Extraction
The basic multi-task architectures aim to extract some common features in shared lower layers.By sharing information between related tasks, multi-task learning can generalize the model more effectively for the tasks [23].

Task-Specific Layer
Following the shared layers, the remaining layers are split into multiple specific tasks.The optimization objective of multi-task learning is as follows: where {ψ t } T t=1 represent the task-specific loss weights.The constraints λ t ≥ 0 in most previous works [24].However, in order to remove gender features for the main task, we apply a negative loss constraint for the gender prediction task.

Mitigate Gender Bias using Negative Multi-task Learning
In this paper, our focus is on "gender" bias, a frequently concerned factor in fairness.Algorithm 1 demonstrates the process of negative multi-task learning to mitigate gender bias.We initially acquire word embeddings from embedding generative models, which serve as input for the negative multi-task learning model.After common features have been extracted through shared layers, two outputs are produced: the main task classification and the gender prediction.The final loss is calculated as: Loss = L main−task -λ*L gender− pr ediction , where λ represents the gender prediction loss constraint.This parameter can be adjusted to balance the main task's accuracy and gender bias.We will discuss the impact of the value of λ in Sect. 4.
The objective of the negative multi-task learning framework is to improve the main task classification accuracy while reducing gender prediction accuracy.In this way, the text classification model can remove gender-specific features and be distributed without exposing gender information.This allows the model to avoid learning biases from training data while still being adequately trained to perform the main task.

Selective Privacy-Preserving Text Embeddings
Machine learning algorithms are utilized to make decisions in various applications.These algorithms rely on large amounts of sensitive individual information to work properly.The sensitive individual information may include individual attributes such as name, address, email, phone number, age, gender, marital status, race nationality, religious beliefs, and so on.Firstly, We define a sensitive information detecting function S(X ), where X denotes a word within a sentence.Function S(X ) scans the given word against the predefined keywords associated with our four privacy attributes.If the word is found within these keyword lists, S(X ) returns 1, otherwise, it returns 0. When S(X ) yields a value of 1, indicating the presence of sensitive information, we proceed to perturb the word embedding corresponding to that particular word, enhancing privacy protection.The four kinds of privacy attributes are gender, age, race, and weight.Each attribute is associated with a specific list of sensitive words or expressions, identified as follows: (1) Gender Attribute: Includes words related to gender identities, e.g., "he", "she", "female", "male", "her", "his", "man", "woman", "boy", "girl", "father", "mother", "son", "daughter".(2) Race Attribute: Consists of racial identifiers, e.g., "white", "black", "caucasian", "african", "asian", "hispanic", "latino", "mexican", "european", 'chinese'", "japanese", "korean", "indian".(3) Age Attribute: Employes a regular expression (r'\d+ year|years old') to identify mentions of age in the text.(4) Weight Attribute: Uses another regular expression (r'\d+ pounds \w') to detect references to weight.If S(X ) returns 1, we will use perturb function P to obscure the word embeddings.
, where E represents an original sensitive word embedding of X. N represents a noise vector drawn from a Gaussian distribution, μ represents the mean of the Gaussian distribution.This value is set to 0 to ensure the noise added is equally likely to increase or decrease the original value.σ is the standard deviation of the Gaussian distribution with value 1, controlling the magnitude of the perturbation.D E represents the dimensionality of the original embedding E. This determines the length of the generated noise vector, ensuring that it matches the dimensionality of E.
As shown in Algorithm 2, given a dataset D, each sample is a text sequence X i .We perturb the sensitive word embeddings by adding Gaussian noise.Once the text embeddings are perturbed, the sensitive information transforms into other non-sensitive word embeddings.For example, the sentence 'She is 84 years old, 148 pounds history of hypertension and diabetes', might be changed to 'the is load years old, diagnosed pounds history of hypertension and diabetes' post perturbation.Thus, sensitive information is effectively protected.In addition, we compare four word embedding models: Word2vec, GloVe, BERT, and RoBERTa as the input of our negative multi-task learning models.Word2vec is one of the most popular techniques to learn word embeddings.It includes a two-layer neural network that is trained to reconstruct linguistic contexts of words with each unique word in the corpus being assigned a corresponding vector in the space.GloVe is an unsupervised learning algorithm for obtaining vector representations for words and has been shown to perform well across a variety of NLP tasks.It is based on ratios of probabilities from the word-word cooccurrence matrix.It combines the intuitions of count-based models while also capturing

Evaluation
In this section, we evaluate gender bias protection for both the sentiment analysis task and the medical text classification task.We scrutinize the impact of the negative multi-task learning framework on mitigating gender bias.Additionally, we compare four prevalent word embeddings and implement the selective privacy-preserving method on the medical text classification task.

Sentiment Dataset
The sentiment dataset is extracted from TripAdvisor reviews of restaurants in UK, written by both male and female reviewers.This dataset is selected primarily due to its rich textual content, allowing us to effectively study and model the intricate nuances of sentiment expressions.Furthermore, the gender diversity within the authorship provides an opportune platform to probe and understand the impact of gender bias in sentiment analysis, a critical focus of our study.It is important to note, the sensitive information pertaining to authors is not disclosed alongside their reviews.Hence, our selective-privacy preserving method is not applied to this dataset.The review are rated on a decile scale of 10 to 50 points.Based on these ratings, we categorize the data into positive reviews (review ratings>30) and negative reviews (review ratings<=30), as shown in Table 1.

Medical Dataset
The medical transcription dataset contains sample medical transcriptions for various medical specialties which are scraped from 'mtsamples.com'.Our choice of this specific dataset is driven by its strong alignment with our study's objectives.Notably, it offers a comprehensive variety of textual data that includes sensitive information, which aptly showcases the real-world implementation and effectiveness of our selective-privacy preserving method in L. Gao et al.  a critical context such as sensitive information of patients.Furthermore, this dataset also provides gender attributes, which are indispensable for our multi-task learning approach.It is worth noting that the dataset is highly imbalanced as shown in Fig. 2 .In order to mitigate the effect of class imbalance problems on experiment results, we picked the most two specialties for binary classification to simplify the task.We first removed any invalid sample (either transcription or label is empty), then transformed all the texts to lowercase, deleted punctuations, and removed stopwords.The processed dataset statistic is shown in Table 2.
BERT and RoBERTa word embeddings are obtained from their pre-trained models, each with a dimensionality of 768.The perturb function employed on sensitive word embeddings uses (0, 1)-Gaussian noise.The default loss constraint λ for gender prediction in negative multi-task learning frameworks is set to e −5 .All models are trained and tested using 5-fold cross-validation to estimate the performance change caused by the optimization on each set individually.Each model is trained for 100 epochs with a batch size of 32.We use single task learning models as the baseline models.We examine the impact of both the selective privacypreserving method and the negative multi-task learning framework on mitigating gender bias.
In all experiments, we compare the models under identical settings.In the comparative tables presented below, bold font is used to emphasize the highest performances and highlight significant results.

Accuracy Evaluation
For model utility, we employ different metrics to evaluate the sentiment analysis task and the medical text classification task.In order to evaluate each sentiment class separately, we utilize the F1-score as a measure of the accuracy of the sentiment analysis models.The F1-score is calculated for both negative and positive sentiments.For evaluating the overall performance of the medical text classification task, We use balanced accuracy, defined as follows: where TP represents True Positive, TN represents True Negative, FP represents False Positive, FN represents False Negative.

Gender Bias Evaluation
We measure gender bias using Disparity Score.The concept of Equality of Opportunity Evaluation was proposed by [27].A predictor Ŷ satisfies equality of opportunity with respect to a class y if Ŷ and Z are independent conditioned on Y=y.Y refers to the true labels in the dataset."Equality of Opportunity" is a fairness criterion in machine learning.A predictor Ŷ (the predicted labels by the model) is said to satisfy "Equality of Opportunity" with respect to a specific class y, if the predicted outcomes Ŷ and the sensitive attribute Z (such as gender, race, etc.) are independent when conditioned on the true labels being the specific class Y=y.This says that the true positive rates should be the same for males and females.To measure gender bias analogous to Equality of Opportunity, we utilize pairwise differences in accuracy for predictions.The average difference between males and females is described as Disparit y Score: where Acc denotes the accuracy of each model built in k-fold cross validation.In our experiment, k=5.A Disparit y Score of 0 indicates that there is no gender bias in the predictions.
The closer the disparity score is to 0, the fairer the models are.

Sentiment Analysis
We organize four test groups to measure the disparity in gender bias between negative sentiment and positive sentiment.In the negative multi-task learning framework, the gender prediction loss constraint λ is e −5 .Figure 3 displays the gender differences derived from the sentiment analysis results, indicating that sentiment analysis models perform differently from males and females.The models' performance with male reviews is lower than with female reviews, both for negative and positive sentiments.This implies that sentiment analysis models are better at identifying sentiment from females than from males, thus it is more challenging to detect sentiment in male-authored reviews.Among the four word embeddings considered, RoBERTa exhibits the highest accuracy and low disparity score in sentiment analysis.Table 3reports the disparity score for four groups.Group-1 represents the disparity score for negative sentiment tested on the single-task learning model, Group-2 represents the disparity score for negative sentiment tested on the negative multi-task learning model, Group-3 represents the disparity score for positive sentiment tested on the single-task learning model, Group-4 represents the disparity score for positive sentiment tested on the negative multitask learning model.From Table 3, we can see that positive sentiment has a higher disparity score than negative sentiment across all four word embeddings.This suggests that there exists higher gender bias in positive sentiment than that in negative sentiment.This could potentially be attributed to females using more positive words than males, making it easier for models to detect positive sentiments from females.Conversely, in negative sentiment, it appears that males and females use closer negative words, resulting in less bias when detecting negative sentiment.Among these four word embeddings, the disparity score of negative multi-task learning models all dropped for both negative and positive sentiments.
In addition, we also test the impact of loss constraints for gender prediction on the negative multi-task learning framework using Word2Vec and GloVe.As shown in Tables 4and 5, λ with e −6 has the highest disparity score.As the value of λ increases, the disparity score decreases, reaching the lowest value at e −5 .An ideal model maximizes the classification performance (measured in terms of F1 score) and minimizes the disparity score (gender gap), so we have selected e −5 as the default value of λ.

Medical Text Classification
In the medical text classification task, we explore the influence of both the selective privacypreserving method and the negative multi-task learning methods in mitigating gender bias.The comparative results of the different models are presented in Tables 6 , 7 , 8 and 9 .Four models are trained: Model1 represents single-task learning without privacy-preserving handling, serving as the baseline; Model2 represents single-task learning with selective privacy-preserving; Model3 represents the negative multi-task learning framework without privacy-preserving handling; Model4 represents the negative multi-task learning framework with selective privacy-preserving.Our experimental results using Word2vec are presented in Table 6.When comparing Model1 with Model2, a slight performance drop can be observed due to noise addition.However, by comparing Model1 with Model2 and Model3 with Model4 respectively, it is clear that the selective privacy-preserving method reduces the disparity score, thus mitigating gender biases.Moreover, Model4 achieves the lowest disparity score while maintaining  9 also reveal that negative multi-task learning with selective privacy-preserving can achieve better performance with the lowest gender bias.Among these four word embeddings, GloVe delivers the best performance with a low disparity score.Table 10 illustrates the impact of loss constraints for medical text classification models using Wor2Vec and GloVe.Consistent with sentiment analysis, λ with e −5 achieves the lowest disparity score and acceptable accuracy.

Conclusions
In this paper, we introduced a negative multi-task learning framework to mitigate gender bias in sentiment analysis and medical text classification tasks.We have demonstrated the effectiveness of our approach by applying the negative multi-task learning framework to both tasks, comparing four widely used word embeddings (i.e., Word2Vec, GloVe, BERT, RoBERTa).Experimental results revealed that RoBERTa performs superiorly in terms of accuracy and gender bias mitigation in the sentiment analysis task.Additionally, our results confirmed that the proposed negative multi-task learning framework effectively mitigates gender bias.It significantly reduces the disparity score on both negative and positive sentiment (2.2 and 3.8 respectively) while achieving the highest F1 score.For the medical text classification task, GloVe demonstrates better performance.The selective privacy-preserving method indeed protects individuals' sensitive information.Our experimental results indicated that integrating it with the negative multi-task learning framework further mitigates gender bias, reducing the disparity score by 1.86 while achieving the best accuracy with GloVe.

Fig. 2
Fig. 2 MTsample dataset statistics (only showing the specialties with more than 50 instances)

Fig. 3
Fig.3F1 score of sentiment analysis models for different word embeddings.The top row is Word2Vec (left) and GloVe (right), the bottom row is BERT (left) and RoBERTa (right).λ is e −5 in the negative multi-task learning framework (left)  and GloVe (right) using privacy-preserving and negative multi-task learning

Table 1
The characteristics of the TripleAdvisor dataset

Table 3
Disparity score on sentiment analysis (λ is e −5 in Group-2 and Group-4)

Table 4
Negative multi-task learning models' performance with different λs for the negative sentiment (left) and positive (right) using Word2Vec

Table 5
Negative multi-task learning models' performance with different λs for the negative sentiment (left) and positive (right) using GloVe

Table 6
Medical text classification disparity score using Word2vec (λ is e −5 in Model3 and Model4)

Table 7
Medical text classification disparity score using GloVe (λ is e −5 in Model3 and Model4)

Table 8
Medical text classification disparity score using BERT (λ is e −5 in Model3 and Model4)

Table 9
Medical text classification disparity score using RoBERTa (λ is e −5 in Model3 and Model4) When comparing Model1 with Model3 and Model2 with Model4 respectively, we observe that negative multi-task learning not only reduces the disparity score but also enhances accuracy.Table7presents our experimental results with GloVe.Similar to the Word2vec models, the negative multi-task learning with the selective privacy-preserving model yields the lowest disparity score and highest accuracy.The experiment results with BERT in Table8and with RoBERTa in Table

Table 10
Medical text classification model's performance with different λs with Word2vec