Exploring and Mitigating Gender Bias in Book Recommender Systems with Explicit Feedback

Recommender systems are indispensable because they inﬂuence our day-to-day behavior and decisions by giving us personalized suggestions. Services like Kindle, Youtube, and Netﬂix depend heavily on the performance of their recommender systems to ensure that their users have a good experience and to increase revenues. Despite their popularity, it has been shown that recommender systems reproduce and amplify the bias present in the real world. The resulting feedback creates a self-perpetuating loop that deteriorates the user experience and results in homogenizing recommendations over time. Further, biased recommendations can also reinforce stereotypes based on gender or ethnicity, thus reinforcing the ﬁlter bubbles that we live in. In this paper, we address the problem of gender bias in recommender systems with explicit feedback. We propose a model to quantify the gender bias present in book rating datasets and in the recommendations produced by the recommender systems. Our main contribution is to provide a principled approach to mitigate the bias being produced in the recommendations. We theoretically show that the proposed approach provides unbiased recommendations despite biased data. Through empirical evaluation on publicly available book rating datasets, we further show that the proposed model can signiﬁcantly reduce bias without signiﬁcant impact on accuracy. Our method is model agnostic and can be applied to any recommender system. To demonstrate the performance of our model, we present the results on four recommender algorithms, two from the K-nearest neighbors family, UserKNN and ItemKNN, and the other two from the matrix factorization family, Alternating least square


Introduction
Recommender systems influence a significant portion of our digital activity.They are responsible for keeping the user experience afresh by recommending varied items from a catalog of millions of items and also adapt their recommendations according to the personality and taste of the user.Therefore, a sound recommender system may go a long way in improving user experience quality, hence the user retentivity of a digital outlet.
Recommender systems have historically been judged on their accuracy (Herlocker et al, 2004;Shani and Gunawardana, 2011).When it is concerned with other factors such as novelty, user satisfaction, and diversity (Hurley and Zhang, 2011;Ziegler et al, 2005a;Knijnenburg et al, 2012), the focus continues to be just on the satisfaction of the information needs of the users.Although of immense importance to the relevance of a recommender system, these criteria do not capture the complete picture.In recent years, the public and academic community have scrutinized artificial intelligence systems regarding their fairness.It has been observed that the results generated by various recommender systems reflect the social biases that exist in human stratum (Ekstrand et al, 2018;Shakespeare et al, 2020;Boratto et al, 2019).Scholars have focused on identifying, quantifying, and mitigating the bias present in the results generated by recommendation systems.Burke (2017) presents a taxonomy of classes for fair recommendation systems.The author suggests different recommendation settings with fairness requirements such as fairness for only users, fairness for only items, and fairness for both users and items.Our work falls into fairness for only items category where bias is shown by a particular set of users against a specific set of items in the dataset.In particular, we are interested in studying and eliminating users' biasedness against the items associated with a specific gender in recommendation systems.
Bias prevention approaches can be classified according to the phase of the data mining process in which they operate: pre-processing, in-processing, and post-processing methods.Pre-processing methods aim to control distortion of the training set.In particular, they transform the training dataset so that the discriminatory biases contained in the dataset are smoothed, hampering the mining of unfair decision models from the transformed data.In-processing methods modify recommendation algorithms such that the resulting models do not entail unfair decisions by introducing a fairness constraint in the optimization problem.Lastly, post-processing methods act on the extracted data mining model results instead of the training data or algorithm.The method presented in our work is a hybrid of a pre-processing phase and a post-processing phase.
Two prominent studies have focused on gender bias in recommender systems.The work by Shakespeare et al (2020) establishes the existence of bias in the results of the music recommender systems, and the work by Ekstrand et al (2018) focuses on bias shown by Collaborative Filtering (CF) algorithms while recommending books written by women authors.Both the studies establish that the CF algorithms produced biased results after being fed the biased data from various socio-cultural factors.While both the works focus just on showing the existence of bias in the presence of the users' implicit feedback, we also consider the explicit feedback ratings and the bias that may arise out of it.Thus, our model handles the case when the items associated with specific gender might have received worse feedback than they otherwise ought to achieve by a set of users.We go one step further and propose a model to mitigate these biases by quantifying a particular user's bias and debiasing his or her feedback ratings.We theoretically show that the debiased ratings are unbiased estimators of the true preference of the user.Once the ratings are debiased, they are fed into the recommender algorithms as input to produce recommendations for the desired set of users.Since the recommender system is now fed with the debiased ratings, the resulting recommendations are free from the bias factor and avoid a self-perpetuating loop in the future.
The bias of an individual user reflects his or her taste.However, the KNN based algorithms produce recommendations based on similar characteristics between a set of users and naive implementation of these algorithms reflects the bias of one user in the recommendations produced for the other user.While not directly comparing the rating history of different users or items, Matrix Factorization algorithms rely on deriving latent factors, which depend on the rating history.Both the approaches make the system increasingly biased and homogenized after users interact with their biased recommendations and generate data for the next iteration.The above discussion suggests that though it is necessary to reflect the user's preference in the recommendations produced for him or her to achieve accuracy, it is equally necessary to prevent the bias of one user from reflecting in the recommendations of another similar user.Our research focuses on this particular objective.
Our debiased ratings assure that the biases of one user do not affect other users; however, it may lead to loss of accuracy because of not reflecting the user's own preferences.We introduce a new step called preference correction which injects the user's preference parameter into his/her own debiased recommendation to maintain the accuracy of the system.The novelty of our work lies in computing the user's preference parameter which not only helps in debiasing the ratings but also in maintaining the preferences of users.On the publicly available Book-Crossing dataset (Ziegler et al, 2005b) and Amazon Book Review dataset (Ni et al, 2019), we empirically show that this approach retained the significant reduction in bias and had minimal effect on the accuracy of the system.The bias reflected in the recommendations produced by the UserKNN, ItemKNN, ALS, and SVD algorithms is reduced by as much as 42.39%, 37.65%, 26.51%, and 41.43% respectively for the Amazon dataset and by 37.82%, 30.73%, 24.99%, and 32.34% for the Book-Crossing dataset.
When measured with respect to Root Mean Squared Error(RMSE), the final accuracy loss in the case of the Amazon dataset comes out to be 7.8%, 11.96%, 12.49%, and 10.38% respectively for the four algorithms.In the case of the Book-Crossing dataset, the RMSE loss comes out to be 13.86%, 18.13%, 11.41%, and 12.89% respectively.In particular, the following are our main contributions.

Contributions
• We propose a model to quantify the gender bias in the recommender system when explicit feedback is present.• We propose a principled approach to debias the ratings given and theoretically show that the debiased ratings represent the unbiased estimator of the true preference of the user.• We empirically evaluate our model on publicly available book datasets and show that the approach significantly reduced the biasedness in the system.To show the generality of our proposed approach, we show the results on four algorithms, UserKNN, ItemKNN, ALS, and SVD.• In order to further enhance the accuracy of the debiased system, we propose an approach of preference correction that respects the user's own preferences towards his/her recommendations.We show that the final recommender system significantly reduces the bias in the system while not deteriorating the accuracy much.

Related Works
The problem of gender bias and discrimination has received lots of attention in recent works (Hajian et al, 2016).Many proposals like Pedreschi et al (2008), Pedreschi et al (2009), Ruggieri et al (2010), Thanh et al (2011), Mancuhan and Clifton (2014), Ruggieri et al (2014) are dedicated to detecting and measuring the existing biases in the datasets while other efforts (Kamiran et al, 2010(Kamiran et al, , 2012;;Hajian and Domingo-Ferrer, 2013;Hajian et al, 2014a,b;Dwork et al, 2011;Zemel et al, 2013) are focused on ensuring that data mining models do not produce discriminatory results even though the input data may be biased.Most of these works focus on the classical problem of classification.Amatriain et al (2011) discuss the application of various classification methods like Support Vector Machines, Artificial Neural Networks, Bayesian classifiers, and decision trees in recommender systems.Their findings indicated that a more complex classifier need not give a better performance for recommender systems, and more exploration is needed in this direction.When considering "fairness for only users" according to the taxonomy presented by Burke (2017), Boratto et al (2019) and Tsintzou et al (2018) discuss the bias with respect to the preferential recommending of certain items only to the users of a specific gender.While weighted regularization matrix factorization studied in Boratto et al ( 2019) is only appropriate for implicit feedback, the Group Utility Loss Minimization proposed in Tsintzou et al (2018) 2018) have addressed the issue of fairness of recommender systems with respect to gender, they have done so from the perspective of recommending certain items only to the users of a specific gender.The difference between their work and our study lies in the fact that we focus on the more direct issue of gender bias in recommendations shown to items associated with a specific gender.Shakespeare et al (2020) in their research highlight the artist gender bias in music recommendations produced by Collaborative Filtering algorithms.The work traces the causes of disparity to variations in input gender distributions and user-item preferences, highlighting the effect such configurations can have on user's gender bias after recommendation generation.Mansoury et al (2020) discuss the biases from the perspective of a specific group of individuals (for example, a particular gender) receiving less calibrated and hence unfair recommendations.Ekstrand et al (2018) explores the gender bias present in the book rating dataset.Our work is different from the works by Shakespeare et al (2020), Mansoury et al (2020) and Ekstrand et al (2018) in primarily two factors: (i) we consider explicit feedback as opposed to the implicit feedback, and (ii) we propose a principled approach to debias the ratings and theoretically show that the debiased ratings are unbiased estimators of true ratings.
The research by Leavy et al (2020) focuses on algorithmic gender bias and proposes a framework whereby language-based data may be systematically evaluated to assess levels of gender bias prevalent in training data for machine learning systems.Our work is different from this study as this study is focused on evaluating gender bias in the language and textual data settings, while ours deals with gender bias in a more traditional user-item rating setting.
A couple of works in fair recommender systems focus on improving the exposure of the items belonging to minority groups.They do so by upsampling the items associated with minority groups (Boratto et al, 2021), or by adding more data points to the dataset so as to achieve overall fairness (Rastegarpanah et al, 2019).On the contrary, our goal in this paper is to provide a systematic way to reduce the bias of one user affecting the recommendations to users.We do so via feeding unbiased ratings of the users to the recommender system.This direction avoids the self-perpetuating loop in the recommender system.Once such a system is deployed, there is no further need for interference by the system to ensure fairness.Further, no existing approaches provide a theoretical framework to mitigate the gender bias from the recommender system.We Gender Bias in Recommender Systems believe this is a strong first step in a new direction for a fair recommender system.
3 The Model Consider a recommender system having U = {1, 2, . . ., U } users and I = {1, 2, . . ., I} items.Let D and A denote the set of items associated with disadvantaged group and advantaged group, respectively.For example, in a book recommender system, the books represent the items; D and A represent the set of books written by women and men authors respectively.With respect to book recommender system, researchers have already shown that the data is biased against female authors' books (Ekstrand et al, 2018).
Let r ui ∈ [1, R] denote the rating that user u has given to the item i.As opposed to previous works, we consider explicit feedback wherein biases may not only arise from not giving the rating to the item but may also come from giving a bad rating to the item.The user profile p u = {X u , R u } represents the set of books (X u ) and the ratings (R u = {r ui } i∈Xu ) that user u has given to those items.
The proposed recommender system first pre-processes the data that: 1) finds the log-bias θ u of each user u and 2) generates the debiased rating d ui of each user u and item i using the computed bias in the first step.We then theoretically show that the debiased ratings generated are unbiased estimators of the true preferences of the user for the items rated by them.Thus, the debiased dataset can then be fed into various recommender algorithms to generate an unbiased predicted rating of a user u for the item i, denoted by dui .This debiasing step ensures that the existing biases are not boosted further in the system.Our debiasing model is independent of any recommendation algorithm.We show the performance of our debiasing model on both K-nearest neighborsbased algorithms (UserKNN, ItemKNN) as well as matrix factorization-based algorithms (Alternating Least Square and Singular Value Decomposition) to produce the recommendations.
In the next step, we use preference corrector to reintroduce the preferences of a particular user u to his/her own recommendations.This is achieved via producing a user specific rating rui from the debiased rating dui .The recommendations are re-ranked according to the adjusted ratings, and the recommendations are presented to the user.This step ensures that the system does not lose accuracy for not considering the preferences of the users.Figure 1 shows the schematic diagram of our model.Consider that the ratings r ui are continuous values ranging from 1 to R, then mathematically, a biased recommender system can be represented as follows: 1.Each user u, while rating an item i, scales down the maximum rating R by e pui .p ui is a random variable, drawn from a distribution function P u (I), which has a mean value of α u .p ui represents the logarithm of the true preference of the user u for the item i.For the sake of brevity, we call it log-preference of the user u for the item i. Hence e pui is a representation of the true preference of user u for the item i. 2. In case the item is associated with the disadvantaged group, the user u further scales down the rating of the item by a factor e qui .q ui is a random variable, drawn from a distribution function Q u (I) having a mean value of β u .q ui represents the logarithm of the biasedness of the user u shown to the item i.For the sake of brevity, we call it the log-bias of the user u for the book i. Hence e qui represents the biasedness of the user u for the book i. 3.For each user u, β u is sampled from the a distribution function Ω(x) which governs the global log-bias tendency of the users.We denote the mean value of Ω(x) by γ.Thus, ratings r ui can be expressed as: R/e pui , if i is associated with advantaged group R/e pui e qui , if i is associated with disadvantaged group (1) We now present a detailed description of each of the step.

Estimating the mean value for log-bias
The geometric mean of the ratings given by a user u to the items associated with disadvantaged and advantaged groups, denoted by r ud and r ua respectively, are given by the following expressions: Further, the log bias in the user profile p u , is given by θ u = ln rua r ud .We use geometric mean to compute the average rating of a user due to the following reasons: 1) It is less biased towards very high scores as compared to arithmetic mean (Neve and Palomares, 2019) and 2) when cold users are involved, aggregating recommendations using the geometric mean is more robust as compared to arithmetic mean (Valcarce et al, 2020).
The below lemma shows that θ u is an unbiased estimator of β u .Taking expectation both sides: Using linearity of expectation and some simplification, we get: Once we get the log biasedness tendencies of users, we use them to produce the debiased ratings for the given dataset.

Debiasing the Dataset
The debiased rating of the item i associated with disadvantaged group and rated by user u is given as d ui = r ui e θu We now provide the main theorem of our paper.
Theorem 2 ln(d ui ) is the unbiased estimator of the log of the true rating of the item i.
Proof ln(d ui ) = θu + ln(r ui ) = θu + ln R − p ui − q ui .Last equality is obtained from Equation 1. Taking expectation both sides: As we can see, the expected value of ln(d ui ) contains only the term representing the true preference of the item for user u. □ Thus, instead of r ui , ratings d ui are fed into the recommender system to generate the predicted unbiased ratings dui .Simply removing the bias from the user's rating could severely affect the system's accuracy because the bias of an individual user reflects their taste.However, the debiasing step helps prevent the bias of one user from affecting the recommendation of other users.Next, we use preference corrections by correcting the predicted rating of the user with respect to his/her own preference parameter.

Preference Correction to Improve Accuracy
Note that when the users are inherently biased against a group of items, D then showing the items from D naively to these users will severely affect the accuracy of the system.The goal of this work is not just to promote the exposure of the items among the two groups but is to not let the bias of one user creep into the bias of the other user.This was achieved via debiasing the dataset.Once the debiased ratings are generated, the accuracy of the system is maintained by introducing a correction factor.Although providing us with higher accuracy, the idea to re-introduce the correction factor may lead to an overall increase in the individual biases.This on a prima-facie may look self-defeating, but we need to note that final ratings still have significantly less bias than original ratings.If we do not introduce the correction factor, the users might flock to a substantial bias platform due to poor accuracy.
The correction is achieved via multiplying the predicted ratings of items associated with disadvantaged group by a factor e −θu .Thus, the final recommended ratings will be given as rui = dui e −θu .Similar to the calculation of bias in the dataset, we can now compute the bias in the recommendation profile.

Bias in recommendation profile
We generate recommendations for the users in the test set T .The recommendation profile for a user u ∈ T is denoted by pu = { Xu , Ru }, which represents the set of recommended books ( Xu ) for the user u and their predicted ratings ( Ru = {r ui } i∈ Xu ).Let the set of items associated with disadvantaged and advantaged groups be denoted by D and Ã respectively.The average predicted ratings of the items associated with disadvantaged and advantaged groups, denoted by rud and rua respectively, are given by: rud = i∈ D∩ Xu rui 1/| D∩ Xu| and rua = i∈ Ã∩ Xu rui 1/| Ã∩ Xu| where rui is the predicted rating given to item i in the recommendation-profile generated for a user u.The log-bias in the recommendation-profile p u , denoted by θu , is then given by θu = ln rua rud .For an unbiased recommendation-profile, θu = 0.A profile biased against disadvantaged groups will have θu > 0. We can then compute the overall bias of the recommender system by taking the average overall users, and this average gives us the estimated value of γ.

Dataset
To evaluate the proposed model, we run experiments on two publicly available book rating datasets, the Book-Crossing dataset, originally put together by  Ni et al (2019).We further process this dataset through the following stages:

Book Author Identification
Their unique ISBNs identify the books in both datasets.We identified the authors of the books present in the datasets via their ISBN numbers using the following three API services: Google Books API APIs (Accessed: 2021-02-24), ISBNdb API ISBNDB (Accessed: 2021-02-27), and Open Library API OpenLibrary (Accessed: 2021-03-02).We could not identify the authors of some of the books.Hence we discarded those books from the dataset.

Author Gender Identification
We identified the genders of the authors via their first names.We used Genderize.iothe gender of a name (Accessed: 2021-03-5), an API service dedicated to identifying the gender given the first name of the person.We used a minimum confidence threshold of 90% for gender identification.We could not identify the gender of some of the authors.We discard the books written by those authors from the dataset.

Filtering
We filtered the Book-Crossing dataset to include only those books with at least 50 ratings and only those users who have rated at least 50 books.Amazon dataset was significantly larger as compared to the Book-Crossing dataset.
We filtered it to include only those books with at least 100 ratings and only those users who have rated at least 100 books.We did this filtering so that recommender algorithms have much data to produce accurate recommendations.The statistics of filtered datasets are mentioned in Table 1.The number of books written by male authors is almost equal to that of female authors for both datasets.

Input Bias
We show the distributions of log-bias tendency (θ u ) of the users in the Amazon dataset and the Book-Crossing dataset in Figure 2. We observe that the mean log-bias tendency over all the users in the Amazon dataset is higher (0.176) than that of the Book-Crossing dataset (0.157)1 .

Output Bias
We randomly separate 20% of users in each dataset as the test group.We generate the recommendations for the users in the test group using two Knearest neighbors-based algorithms, UserKNN and ItemKNN, and two matrix factorization-based algorithms, Alternating Least Square and Singular Value Decomposition.These algorithms were selected because the accuracy and ranking relevancy of the recommendations produced by them were among the highest values compared with other algorithms.Hence coupling our model with them would best highlight the effects brought about by the same.We calculate the estimated value of log-bias ( θu ) and accuracy in the recommendations separately for each algorithm applied on the two datasets.For this, we use two error measures, the Root Mean Squared Error (RMSE) and the Mean Absolute Error (MAE), and two ranking relevance parameters, Normalized Discounted Cumulative Gains and Mean Reciprocal Rank.We first begin plotting the log-bias ( θu ) distribution for the recommendations produced by the algorithms without employing our debiased model in Figures 3 and 4 for Amazon datasets with respect to K-nearest neighbor family and matrix factorization family of algorithms.Figures 5 and 6 similary present the log-bias distribution for the recommendations produced by the two family of algorithms for Book-Crossing datasets respectively without employing our debiased model.We compute the log-bias by feeding biased ratings r ui to the We next deploy our model partially.We leave out the preference correction phase and produce the recommendations using the algorithms mentioned before by feeding the debiased ratings d ui to these algorithms.We estimate the mean log-bias tendency in the recommendations θu using debiased ratings produced by the algorithms dui .The log-bias ( θu ) distribution for the recommendations produced by the algorithms after partial deployment of the model is depicted in the Figures 7 and 8 for Amazon dataset and in the Figures 9 and 10 for book crossing dataset.As can be seen, there is a significant reduction in logbias tendency (64.38%) in the Amazon dataset and (53.67%) in Book-Crossing dataset for the UserKNN algorithm.However, we also see an increase in error rates on both datasets.This is because the test data itself contains biases.
Finally, we deploy our complete model after adding the preference correction method and repeat the experiment.The log-bias ( θu ) distribution for the recommendations produced by the algorithms after deployment of the complete  2 for Amzaon dataset and in Table 3 for book crossing datasets.As can be seen, there is still a significant reduction in mean log-bias tendency, which reduces by 42.39% in the Amazon dataset and by 37.82% in the case of the Book-Crossing dataset for UserKNN algorithm.The accuracy loss, however, is insignificant, making this trade-off advantageous.Figure 15 presents the percentage gain in bias reduction for both the dataset.The percentage loss in accuracy is depicted in figures 16 and 17 for Amazon and Book-Crossing datasets respectively.The percentage loss in ranking relevancy metrics are depicted in figures 18 and 19 respectively.
We next conduct significance testing to validate the log-bias reduction.Tables 4 and 5 show the p-values obtained from left-tail significance tests on the log-bias of the recommendations made for the users in the sample.We can see from the p-value for the Amazon datasets that the bias reduction is significant.For the Book-Crossing dataset, the significance of the bias reduction is less pronounced.One of the prominent reasons for this is that the test sample size Gender Bias in Recommender Systems    for the Book-Crossing dataset was relatively small due to the small number of users in the dataset.In essence, the utility of the recommender system is maintained while reducing the log-bias tendency in the recommendations.We further observe that the bias reduction is more in the case of UserKNN based recommendations than the ItemKNN based recommendations.This observation can be attributed to the fact that our model addresses the bias originating from the distortion in ratings from the users' side.It compares the ratings of an item given by a particular user with the appropriately scaled average of ratings given by other users to that item in the dataset.It, therefore, Gender Bias in Recommender Systems    resonates with the UserKNN algorithm, which predicts the ratings of an item for a particular user based on the ratings of that item for his or her peers.The ItemKNN algorithm, on the other hand, predicts the ratings of an item for a particular user based on the ratings given to similar items by that user.The model does not sit squarely with ItemKNN.Thus the bias reduction in UserKNN is more as compared to that in the case of ItemKNN.We further observe that the bias reduction is more in the case of the AZ dataset as compared to the BX dataset.This observation can be attributed to the AZ dataset having a higher input mean log-bias tendency.Further, the AZ dataset has a significantly larger Gender Bias in Recommender Systems We observe that accuracy and ranking relevancy loss is, in general, higher for ItemKNN as compared to UserKNN.This is due to the fact that the model quantifies the bias of users by comparing the ratings given by them to particular items with a scaled average of ratings given by their peers to those items.This resonates with the UserKNN algorithm, which predicts user ratings for particular items based on the ratings of similar users.Thus the model is better oriented towards the UserKNN algorithm, giving better accuracy and bias reduction in its case.In the case of matrix factorization algorithms, the accuracy and ranking relevancy losses are relatively comparable.It is not clear which one of the two algorithms is more coherent with the model.We further observe that accuracy loss on BX dataset is higher than that of AZ dataset.This observation can be attributed to the fact that the user and item base of the AZ dataset is higher as compared to the BX dataset.Thus, the bias score estimates are more accurate, which provides more accurate predictions of the item scores for the users when reinserted into the recommendations.

Conclusion and Future Work
We proposed a model to quantify and mitigate the bias in the explicit feedback given by the users to different items.We theoretically showed that the debiased Gender Bias in Recommender Systems   ) with just 10% decrease in accuracy using the UserKNN algorithm.Similar trends were observed for other algorithms such as ItemKNN, ALS, and SVD.Our model is independent of these algorithms' choices and can be applied with any recommendation algorithm.We used book recommender system because we were able to generate the gender information from publicly available APIs.Our model is not restricted to book recommender system as long as protected attribute information about the items is known.We leave extension

Lemma 1
The expectation of log-bias, θu in the user profile pu represents the mean value of the log-bias, βu of the user u.Proof Let us denote m = |D ∩ Xu| and n = |A ∩ Xu| to be the number of items associated with disadvantaged and advantaged group respectively in user profile pu.Gender Bias in Recommender Systems

Fig. 2 :
Fig. 2: User log-bias in the original dataset

Fig. 3 :Fig. 4 :
Fig. 3: Output log-bias in AZ dataset without employing the model under K-nearest neighbour family of algorithms

Fig. 5 :Fig. 6 :
Fig. 5: Output log-bias in BX dataset without employing the model under K-nearest neighbour family of algorithms

Fig. 7 :Fig. 8 :
Fig. 7: Output log-bias in AZ dataset with debiasing under family of K-nearest neighbour algorithms

Fig. 12 :
Fig. 12: Output log-bias in AZ dataset with Preference correction under family of matrix factorization algorithms

Fig. 13 :
Fig. 13: Output log-bias in BX dataset with reinserting the biases under family of K-nearest neighbour algorithms

Fig. 14 :
Fig. 14: Output log-bias in BX dataset with reinserting the biases under family of matrix factorization algorithms Fig. 16: Accuracy loss for AZ dataset (a) In terms of NDCG (b) In terms of Reciprocal Rank

Table 1 :
Ziegler et al (2005b)r et al (2005b)and the Amazon Book Review dataset, put together by

Table 2 :
Summary of Results for Amazon Dataset

Table 3 :
Summary of Results for Bookcrossing Dataset

Table 4 :
Significance test results for bias reduction for Amazon Dataset

Table 5 :
Significance test results for bias reduction for Bookcrossing Dataset