How do item features and user characteristics affect users’ perceptions of recommendation serendipity? A cross-domain analysis

Serendipity is one of beyond-accuracy objectives for recommender systems (RSs), which aims to achieve both relevance and unexpectedness of recommendations, so as to potentially address the “filter bubble” issue of traditional accuracy-oriented RSs. However, so far most of the serendipity-oriented studies have focused on developing algorithms to consider various types of item features or user characteristics, but are largely based on their own assumptions. Few have stood from users’ perspective to identify the effects of these features on users’ perceptions of the serendipity of the recommendation. Therefore, in this paper, we have analyzed their effects with two user survey datasets. These are the Movielens Serendipity Dataset of 467 users’ responses to a retrospective survey of their perceptions of the recommended movie’s serendipity, and the Taobao Serendipity Dataset of 11,383 users’ perceptions of the serendipity of a recommendation received at a mobile e-commerce platform. In both datasets, we have analyzed the correlations between users’ serendipity perceptions and various types of item features (i.e., item-driven such as popularity, profile-driven such as in-profile diversity, and interaction-driven including category-level and item-level features), as well as the influence of several user characteristics (including the Big-Five personality traits and curiosity). The results disclose both domain-independent and domain-specific observations, which may be constructive in enhancing current serendipity-oriented recommender systems by better utilizing item features and user data.


Introduction
Recommender systems (RSs) serve as popular tools in the current online environment to filter massive information and help users make effective decisions. Traditional RSs (e.g., collaborative filtering-based and content-based RSs) mainly aim to maximize the accuracy of item prediction, but have the potential to create the "filter bubble" problem (Matt et al. 2014;McNee et al. 2006), as recommendations that are too similar to users' previous preferences may stop them from exploring different and new items. Because of this limitation, serendipity-oriented RSs have been proposed, targeted to achieve both relevance and unexpectedness (also called surprise) (Castells et al. 2011;Kaminskas and Bridge 2016). A recent study has shown that increased user perception of recommendation serendipity can significantly lead to higher user satisfaction with the recommendation (Chen et al. 2019b).
The idea behind the existing serendipity-oriented RSs is to capture item features that may satisfy users' needs for pleasant unexpectedness. For example, it was assumed that unpopular items would bring more uncertainty and hence be more unexpected to users, so the importance of unpopular items was increased when generating serendipitous recommendations (Lu et al. 2012a). In other work, it was assumed that the more dissimilar an item is to the user's profile (i.e., the previously visited items), the more unexpected it would be to the user (Kotkov et al. 2018b;Zheng et al. 2015). Users' personal characteristics have also been considered in some work. For instance, highly curious users were assumed to be more likely to accept novel items, while users with less curiosity might prefer to receive recommendations similar to their previous preferences (Menk et al. 2017;Maccatrozzo et al. 2017;Niu and Abbas 2017).
However, to the best of our knowledge, there has been little research looking at what item features and/or user characteristics might affect users' perceived serendipity of the recommendation. Given that serendipity involves users' positive emotional responses (i.e., pleasant unexpected feeling), the lack of validation with real users may impair the applicability of existing serendipity-oriented RSs in real-life situations (Kotkov et al. 2018a). Furthermore, there has been little discussion of the application domain of the proposed hypotheses, ignoring the differences brought by domain-specific properties. For example, in the context of e-commerce that induces certain financial risks, people may be more cautious when making decisions. Their propensity toward serendipity might therefore differ from the evaluation of a low-risk product (such as a movie recommendation). Therefore, to fill the vacancy in the state-of-the-art research, we have focused on the following three major research questions in this work: RQ1: What item features can be correlated with users' perceived serendipity of a recommendation?
RQ2: What user characteristics can affect their perceived serendipity, and what is the relative importance of all the considered features (i.e., including item features and user characteristics)? RQ3: Are there any domain-specific differences in the above effects?
To address these three questions, we conducted analyses on two user survey datasets. One is the publicly available Movielens Serendipity Dataset (Kotkov et al. 2018a), containing 467 users' feedback on serendipity-related objectives of movie recommendations. The other dataset is the Taobao Serendipity Dataset (Chen et al. 2019b;Wang et al. 2020), collected via a large-scale user survey on Mobile Taobao (a popular mobile e-commerce app in China). Specifically, it contains the feedback of 11,383 customers on the serendipity of recommendations they received through the app. Moreover, the dataset has the demographic information of these users (i.e., age and gender) and their psychological profiles (i.e., the Big-Five personality traits and curiosity), so that it is possible to identify their effects on users' serendipity perceptions.
Through statistical approaches, we identified several domain-independent and domain-specific observations. In particular, in both the Movielens and Taobao datasets, items with lower popularity or higher temporal interactions (i.e., items with categories that users have often visited in the same time period) were more likely to be perceived as serendipitous by users. Regarding domain-specific observations, we found that in movie recommendations the more diverse the item categories in user profile (i.e., in-profile diversity), the more likely the user felt a recommendation to be serendipitous. Movies of categories that the user has less interacted with were more likely to be perceived as serendipitous. However, the results were opposite in the Taobao dataset. Regarding user characteristics, we found that users' curiosity level and age were most likely to affect their perceptions of recommendation serendipity.
There are three major contributions of this work: (1) We tested the impacts of a number of item features and user characteristics on users' perceived recommendation serendipity, in two user survey datasets of different domains (i.e., movies and e-commerce).
(2) We measured the relative importance of different features as well as the influence of profile length on the correlations between item features and users' perceived serendipity.
(3) We identified differences in users' serendipity perceptions of movie and ecommerce recommendations, which could be constructive in developing dedicated serendipity-oriented recommender systems that consider domain specialities.
The remainder is organized as follows. We first introduce the related work on serendipity-oriented RSs, with a summary of item features and user characteristics considered (Sect. 2). We then present the descriptive statistics of the two user survey datasets used in Sect. 3, followed by investigated item features in Sect. 4, user characteristics in Sect. 5, and the results analysis in Sect. 6. Subsequently, we answer the four research questions in Sect. 7, discuss the implications and limitations of our work in Sect. 7.3, and conclude the work in Sect. 8.

Background
The original definition of serendipity is " [...] making discoveries, by accidents and sagacity, of things which they were not in quest for [...]" (Walpole 1960;Bogers and Björneborn 2013;Maccatrozzo et al. 2017). In recommender systems (RSs), it has been considered for breaking through barriers, especially the "filter bubble," of traditional accuracy-oriented approaches. In Kotkov et al. (2016), serendipity-oriented recommendation methods were classified into three categories: re-ranking (post-processing after conventional recommendation algorithms), modification (taking into account new factors to change the objective functions of conventional recommendation systems), and new algorithms (from the framework of conventional recommender systems). In this section, we introduce related methods from the aspects of item features and user characteristics they have considered.

Item features
One of the most commonly considered item features in existing serendipityoriented RSs is item (un)popularity, given that unpopular items are likely to be unknown by users and hence assumed to be more unexpected than popular ones. For example, Lu et al. (2012a) revised the optimization function of the matrix factorization (MF) by adding a popularity term to lower the importance of popular items. Wang et al. (2018) incorporated item popularity to identify active users in the system and further attached a higher weight to active users' ratings when computing predicted ratings through typical collaborative filtering.
The second type of frequently used item feature is item (dis)similarity to the user profile. It was assumed that the more different/dissimilar an item is from the set of items that the user has visited before, the more unexpected it would be (Zheng et al. 2015). For example, Zheng et al. (2015) combined both item unpopularity and dissimilarity to define unexpectedness and proposed an unexpectedness-augmented PureSVD latent factor model. Kotkov et al. (2018b) also considered both item unpopularity and dissimilarity in their proposed hybrid re-ranking algorithm. In these two works (Zheng et al. 2015;Kotkov et al. 2018b), popularity was defined based on the number of user visits, and item dissimilarity was computed as the average of pairwise item dissimi-larities between the currently considered item and those in the user profile. Nakatsuji et al. (2010) proposed a class distance metric based on item taxonomy to compute item similarity, that is, the smallest distance from the target item's class to the classes that the user has recently accessed. Adamopoulos and Tuzhilin (2014) proposed a utility model to calculate an item's unexpectedness, considering the distance of the target item from a set of expected items. More recent serendipity-oriented recommendation approaches, such as those based on graphs (Onuma et al. 2009;De Gemmis et al. 2015;Tuval 2019) or neural networks (Pandey et al. 2018;Li et al. 2020), implicitly leveraged the item (dis)similarity in the process of recommendation generation.
The third type of item feature is temporal information. For example, Chen et al. (2019b) verified that timeliness (i.e., timely recommendation) has a positive effect on users' perceived serendipity. Chiu et al. (2011) employed the access time of users to the target item as one dimension of measuring serendipity, assuming that an earlier access time to a new item leads to more serendipity. Kawamae (2010) attempted to predict when a user would purchase an item and reduce the time cost for users to find that item. The authors anticipated this time-saving strategy to have an effect of unexpectedness.
The fourth type is co-occurrence information, because it was assumed that the higher co-occurrence probability one item has with other items, the lower unexpectedness it will trigger. There are two kinds of co-occurrence information: One is the probability that the target item's attributes appear in the user's previously visited items (Akiyama et al. 2010;Niu and Abbas 2017), and the other is the probability that different items appear in the same user's profile (Huang et al. 2018)). For instance, Akiyama et al. (2010) defined a TV program's unexpectedness by considering the possibility of its attributes' co-occurrence in all programs. Niu and Abbas (2017) assumed items with attributes less likely to appear together in the target user's profile would generate high-level unexpectedness. Huang et al. (2018) considered contextindependent co-occurrence based on the assumption that the higher the probability of two entities appearing in the same query, the stronger their relatedness, as well as context-dependent co-occurrence through a language model that measures the relatedness between two entities' descriptions.
In addition, some papers considered domain-specific item features. For instance, Zhang et al. (2012) considered listener diversity (the entropy of the artist's listener distribution) to enhance the serendipity of their hybrid rank-interpolation music recommendations. Shen et al. (2020) took advantage of music features (e.g., artist, genre, language, release year, acoustic features, and lyrics) to build their serendipity-oriented RS.

User characteristics
According to our literature survey, one of the most commonly considered user characteristics in existing serendipity-oriented RSs is curiosity (Chen et al. 2019b), an important premise of users' appetite for novel knowledge and experiences in the field of psychology (Berlyne 1960;Zhao and Lee 2016). For example, Menk et al. (2017) predicted the target user's curiosity from their data in social networking, and then generated serendipitous recommendations tailored to the predicted curiosity value. Niu and Abbas (2017) proposed a computational serendipity model with the goal of triggering users' curiosity toward the recommended item at the appropriate time. Maccatrozzo et al. (2017) developed a serendipity model based on a curiosity theory to trigger user curiosity by matching the level of the recommendation's novelty to the target user's coping potential for new things.
Another type of user characteristic, also inherently related to user personality, is whether the user is an innovator or not as this can reflect the person's sensitivity to new items (Kawamae et al. 2009), on the assumption that items purchased by innovators can surprise their followers. Innovators may have two major properties as stated by Wang et al. (2018): high activity and strong ability to discover new items (e.g., taking less time to find unpopular items), and being unlikely to follow the mainstream. In the work of Wang et al. (2018), the recommendations were retrieved from items interacted by the innovators who were nearest to the current user.
Other considered characteristics include user elasticity (measured by the diversity of movie genres in the user's profile), for which an approach for elastic serendipity of movie recommendations was proposed (Li et al. 2019). Specifically, as a higher degree of user elasticity may indicate the user's higher tendency to accept movies of different genres, movies with lower relevance but higher diversity could be considered as recommendation candidates for such users. The difference between users' long-term and short-term preferences has also been leveraged in recent work. For instance, Li et al. (2020) regarded items not apparently familiar to the user but that meet the user's short-term demands as well as being related to the user's long-term preference to be highly serendipitous.

Limitations of related work
Although various types of item features or user characteristics have been considered in existing serendipity-oriented RSs, their effects have been largely based on researchers' assumptions and simply tested by offline simulations. To the best of our knowledge, there has been little work to validate empirically their impacts on users' perceptions of recommendation serendipity. Given that serendipity essentially embodies the pleasantly unexpected feelings of users, the lack of validation with real users may impair the applicability of these systems in real-life situations. Moreover, the majority of related work has ignored the differences likely to be brought by domain specialties (Burke and Ramezani 2011) .
In our work, we primarily analyzed the effects of three types of item features on users' serendipity perception in two application domains: movie recommendations and e-commerce recommendations. The features include item-driven features (e.g., popularity), profile-driven features (e.g., in-profile diversity), and interaction-driven features (e.g., attribute co-occurrence). For user characteristics, we focused on both curiosity and personality (defined by standard psychometric measurements) in our analyses. In addition, we considered some demographic properties (e.g., age and gender) as they were found to have an effect on users' general behaviors and preferences (Hu and Pu 2013;Schedl et al. 2015).

User survey datasets
We employed two user survey datasets to conduct the analysis: One was collected by Kotkov et al. (2018a) through Movielens (a popular movie recommender system developed by the Grouplens research lab at the University of Minnesota), i.e., the Movielens Serendipity dataset 1 ; and the other was collected on Mobile Taobao (a popular mobile e-commerce app in China), i.e., the Taobao Serendipity Dataset 2 (Chen et al. 2019b;Wang et al. 2020).

Movielens Serendipity Dataset
In the described experiment of Kotkov et al. (2018a), the members of Movielens were invited to join the survey from April 1, 2017 to January 15, 2018. Movies that were unpopular (with fewer ratings) but liked by the target user (with a higher rating above 3.5 out of 5 in the user's historical records) were presented to the user to obtain her feedback on how novel or unexpected the movie was. It was thus performed in a retrospective manner as users were asked to recall their feelings when they first received the recommendation. Because previous studies show that the stronger the emotion, the more long-lasting the memory is (Alberini 2010;Hollis and Brown 2010) and serendipity is a relatively impressive perception, we believe that investigating this dataset is worthwhile. The originally released survey dataset contains 480 users' responses to the survey questions. We filtered out cases where less than five movies were rated by a user before they rated the surveyed movie, to avoid a very short length of user profile in our analyses. There are 467 users remaining with totally 2,019 responses (see statistics in Table 1). We then constructed a three-month user profile for each surveyed pair (user, item) that includes the user's historical rating records in the past three months before s/he rated the surveyed item.
In their survey, participants gave feedback on two novelty questions (strict novelty and motivational novelty) and four unexpectedness questions (unexpectedness to be relevant, unexpectedness to be found, implicit unexpectedness, and unexpectedness to be recommended) (see Table 2). Users were also asked whether the recommendation helped to broaden their preferences (preference broadening). Serendipity was then defined as various combinations of novelty and unexpectedness. For instance, one serendipity definition is "strict serendipity to be found" that is TRUE if a user's ratings on both strict novelty and unexpectedness to be found are both higher than 3 on a 5-point Likert scale. Through cumulative link mixed-effect regression analysis, it was found that serendipitous movies according to seven serendipity definitions (except the definition that depends on motivational novelty and unexpectedness to be relevant) can help broaden user preferences, but no serendipity definition can help increase user satisfaction with the movie (Kotkov et al. 2018a).
However, the survey did not acquire users' perceptions of the recommendation serendipity directly, so it is hard to say which definition more precisely reflects the user's feedback on serendipity itself. Therefore, based on their collected user responses, we aimed to identify a more accurate measurement of user serendipity perception, to accommodate both the relevance and unexpectedness of the recommended item as discussed before. For this purpose, we re-analyzed their data in the following two steps: Step 1: We performed principal component analysis (PCA). This showed that the correlation coefficients are all smaller than 0.8, suggesting that there is no collinearity (Farrar and Glauber 1967) among the observed variables in Table 2. Moreover, we found that only unexpectedness to be relevant exhibited a cross-loading effect (with higher factor loadings on two latent factors). Considering that it is harmful to user satisfaction as found in the original paper (Kotkov et al. 2018b), we decided to exclude it from the subsequent analysis. As a result, there are two latent factors identified (see Fig. 1): "novelty" (related to observed variables strict novelty and motivational novelty) and "unexpectedness" (related to observed variables unexpectedness to be found, implicit unexpectedness, and unexpectedness to be recommended).
Step 2: We then ran the structural equation modeling (SEM) using the SPSS Amos. The final model satisfied several major fit indices: χ 2 = 1.562 (d f = 1, p = 0.211), C F I = 1.000, AG F I = 0.994, R M S E A = 0.021. There are two major findings according to the final path model (see Fig. 1): (1) The impact of "novelty" on preference broadening is not significant (coef. = 0.03, p > 0.05). (2) "Unexpectedness" has a significantly positive impact on preference broadening (coef. = 0.73, p < 0.001). Therefore, as the latent factor "novelty" does not engender significant effects, while "unexpectedness" can lead users to broaden their preferences, we finally chose "unexpectedness" as the indicator of users' unexpected feeling. Moreover, considering that all of the surveyed movies had high relevance to user preference (i.e., Note: *** p < 0.001 by Kolmogorov-Smirnov test the rating was above 3.5 out of 5), it might be reasonable to use the "unexpectedness" score to represent their users' serendipity perceptions. Thus we calculated the "serendipity" score as the average of its three associated observable variables' values (i.e., unex p_ f ind, unex p_imp, and unex p_rec.)

Taobao Serendipity Dataset
From December 21, 2017 to March 17, 2018, a user survey was conducted on Mobile Taobao (Chen et al. 2019b;Wang et al. 2020). In this survey, we obtained users' feedback on the recommendation's serendipity (by asking them to rate the statement directly "The item recommended to me is a pleasant surprise."), and also their demographic properties (including age and gender) and psychological traits (curiosity and Big-Five personality traits; see Table 4). Specifically, user curiosity was assessed via Curiosity and Exploration Inventory-II (CEI-II) (Kashdan et al. 2009), which contains 10 items measuring user "motivation to seek out knowledge and new experiences" and "willingness to embrace the novel, uncertain, and unpredictable nature of everyday life." Personality is grounded on the Big-Five factor model and was assessed by the standard Ten-Item Personality Inventory (TIPI) (Gosling et al. 2003) that measures each personality trait via two opposite questions (e.g., "open to new experiences, complex" and "conventional, uncreative" for Openness to Experience). We received 13,741 users' responses. After filtering out invalid answers (e.g., incomplete responses or contradictory ratings to opposite questions) and outliers (e.g., clicked less than five or more than 15,000 items before taking the survey, or related to uncategorized items), 11,383 users' responses remained, of which 7769 were from female respondents. We had the past three months' historical data for each user (i.e., the items they had clicked and purchased before taking part in the survey). Table 3 lists the statistics of user survey data and profile data. Moreover, as shown in Fig. 2, the item taxonomy (as provided by Mobile Taobao) exhibits a tree structure containing five layers (levels). The leaf category at the end of each path is the direct category that an item belongs to.
Therefore, in comparison with the Movielens Serendipity Dataset, the Taobao Serendipity Dataset not only differs in terms of application domain (e-commerce products vs. movies), but also category structure (hierarchical vs. single-level 3 ). Moreover, it contains data of user characteristics.  Table 4 Survey questions in Taobao Serendipity Dataset (serendipity and curiosity questions rated on a 5-point Likert scale from 1-"strongly disagree" to 5-"strongly agree"; Big-Five personality questions rated on 7-point Likert scale from 1-"strongly disagree" to 7-"strongly agree")  (20) 2nd-level categories (91) 5th-level categories (2675) 3rd-level categories (1259) 4th-level categories (5445) Student uniform Work uniform

Item features
In this section, we describe how we have classified and extracted item features from the two datasets. The extracted item features can be classified into three groups: Itemdriven, profile-driven, and interaction-driven. All the item features are summarized in Table 5.

Item-driven feature
The item-driven feature considers the information of the target item (i.e., the current recommendation in the survey data) r , regardless of the target user u's profile P u (that consists of items that u has previously interacted with). In other words, it is nonpersonalized, as the same item's feature values for different users are the same. As mentioned in Sect. 2, item unpopularity is one of the typical features used to indicate the item's unexpectedness level in related work (Kaminskas and Bridge 2016), for which the popularity is usually determined by the number of users who have interacted with that item.
In the Movielens dataset, we computed the item popularity in a similar way, i.e., the percentage of users who have rated the movie. In the Taobao dataset, because the interaction data of all users are not available, we estimated popularity through a binary function by using the HOT function provided by Mobile Taobao (Chen et al. 2019b), which maintains a set of the most popular items. In this way, we measured the target item r 's popularity as 1 if the item appears in the output of the HOT function, otherwise it is 0.

Profile-driven feature
Contrary to item-driven features, profile-driven features are dependent of the target user u's profile P u , but regardless of the target item r . It should be noted that in the Movielens dataset, the type of user profile is rating-based P rating u (previously rated movies), while in the Taobao dataset, each user has two types of profile: click-based

Popularity
The item's popularity

Profile size
The total number of interactions in the user profile

In-profile diversity
The entropy of item categories in the user profile

Interaction
The frequency that the user interacted with items of the same category as that of the target item

Day-of-the-week interaction
The frequency that the user interacted with items of the same category on the same day of the week as the target item

Time-of-the-day interaction
The frequency that the user interacted with items of the same category at the same time of the day as the target item

Time difference
The time distance from the target item r to the latest item in the user profile that belongs to the same category

Item-level
Content-based distance (min/rec) Jaccard distance based on common content labels of two items Collaborative-based distance (min/rec) Jaccard distance based on common users who interacted with the same items

PMI distance (min/rec)
The distance based on Point-wise Mutual Information (PMI) that indicates the probability that two items are seen by the same user

Category difference (min/rec)
The category level that two items differ from each other (applicable only to the hierarchical category structure) Taxonomic distance (min/rec) The minimum number of hops from one to another item's leaf category (applicable only to the hierarchical category structure) P click u (previously clicked items) and purchase-based P purchase u (previously purchased items). To be specific, we analyzed two profile-driven features.
1. Profile size. This feature indicates the total number of interactions in the user profile: The larger the profile size is, the more active the user has been in interacting with items.
2. In-profile diversity. As inspired by (Wu and Chen 2015), this feature particularly reflects the diversity degree of items within the user profile, so the higher the diversity is, the more likely the user is to want to experience different types of items. Formally, we employed entropy (Adomavicius and Kwon 2012) to define this feature: where t ui denotes the timestamp when user u interacted with item i, |{t ui }| is the number of the user's historical records, c denotes a category from the entire category set C, and C i is the category set of the item i. |{t ui |i∈P u ∩c∈C i }| |{t ui }| therefore returns the probability that items belonging to a specific category c are visited by the target user u. In the Movielens dataset, the category set C contains 19 categories (e.g., "Drama," "Fantasy," and "Romance"). In the Taobao dataset, C is the set of all leaf categories (9,085 in total) and C i is item i's leaf category.

Interaction-driven features
In addition to features that rely on either the target item r or the target user u, the third type of item features depends on the interaction between u and those items related to r , called the interaction-driven features. According to the granularity of the information used, we divided those features into two groups: category-level interaction-driven features and item-level interaction-driven features.

Category-level interaction-driven features
Category-level interaction-driven features inspect the category-level similarity of inprofile (P u ) items to the target item r , for which we define five sub-types according to the different similarity measures considered.
1. Interaction. This feature measures the frequency that the user has interacted with items (in their profile P u ) of the same category as that of the target item r : In the Movielens dataset, c i = c r means that items i and r have at least one category in common. In the Taobao dataset, c i = c r means that i and r have the same leaf-level category 4 2. Temporal interaction. This measures the frequency that the user has interacted with items of the same category as that of the target item at a similar time, given that user preferences can be time-sensitive (Shen et al. 2020;Khoshahval et al. 2018). Taking into consideration the three-month user profile, we derived two time-sensitive features as inspired by (Khoshahval et al. 2018): day-of-the-week interaction and time-of-the-day interaction.
• Day-of-the-week interaction: where day(t ui ) encodes the timestamp t ui as 0 (Sunday), 1 (Monday), ..., or 6 (Saturday). day(t ui ) = day(t ur ) means that the day of a week when the user interacted with the item i is the same as the day when the user received the target item r . • Time-of-the-day interaction: where period(t ui ) encodes the timestamp t ui as 1 (morning, from 6 a.m. to 12 p.m.), 2 (afternoon, from 12 p.m. to 6 p.m.), 3 (evening, from 6 p.m. to 12 a.m.), or 4 (night, from 12 a.m. to 6 a.m.); period(t ui ) = period(t ur ) means that the time of a day when the user interacted with i is the same as the time when the user received r .
3. Time difference. This feature captures the time distance between the target item r and the latest one in user profile P u that belongs to the same category: where latest(·) returns the most recent timestamp. The result of T imeDi f f (r , P u ) is the numerical time difference in days (e.g., 3.15 days).

Item-level interaction-driven features
Item-level interaction-driven features are based on pairwise item similarity to reveal the relationship between the target item r and those in the target user u's profile P u , for which we also identify five sub-types according to related work Bridge 2014, 2016;Nakatsuji et al. 2010). For each sub-type, we investigate two variants: the minimal (min) variant that considers the smallest feature value between the target item and items in the target user's profile, and the recent (rec) variant that considers the feature value between the target item and the most recent item that the target user interacted with. The former might indicate the target user's overall preference, while the latter is more related to the user's recent preference. Note that we did not calculate the average distance because it was found to result in information loss in related work Bridge 2014, 2016) 1. Content-based distance (min/rec). The two features are based on the Jaccard distance measure to calculate the similarity of two items based on their content information. We adopted Jaccard distance because it obtains better performances than others (e.g., Cosine distance) in our preliminary experiment. Concretely, we calculate the minimal content-based distance as where C i denotes the set of categories of item i. The recent variant is calculated as where C rec denotes the set of categories of the item the user most recently interacted with.

Collaborative-based distance (min/rec).
Features of this type are based on the ratio of users who accessed both the target item r and the certain item in the target user's profile among users who accessed either of them. The minimal collaborative-based distance is calculated as where U i is the set of users who have interacted with item i. The recent collaborative-based distance is calculated as where U rec is the set of users who have interacted with the item the user most recently interacted with. 3. PMI distance (min/rec). Point-wise Mutual Information (PMI) can be used to check the probability that two items are seen by the same user, under the assumption that if the target item is rarely observed with items in the target user's profile P u , it will be more unexpected to the user Bridge 2014, 2016). In our work, we calculate the minimal PMI distance as where p(i) is the probability that item i is interacted by any user, and p(r , i) is the probability that two items r and i are interacted by the same user. The recent PMI distance is where i rec denotes the item that the user most recently interacted with. 4. Category difference (min/rec). Category difference indicates at which category level the two compared items become different over the item taxonomy (applicable only to the hierarchical category structure). The minimal category difference is defined as where L is the maximal number of layers (levels) in the item taxonomy (e.g., 5 in the Taobao dataset as shown in Fig. 2). Therefore, the largest possible category difference is L + 1 (6 in Taobao dataset), when the two items do not share any common category (i.e., are different starting from the top-level category). The smallest is 1, when the two items belong to the same leaf category at the 5-th level.
The recent variant is calculated between the target item r and the item that the user most recently interacted with:

Taxonomic distance (min/rec).
Taxonomic distance is also only applicable to the hierarchical category structure, which is the minimum number of hops from one item t's leaf category c i to another item's over the item taxonomy (called class distance in (Nakatsuji et al. 2010)). The minimal taxonomic distance is calculated as T axoDist min (r , P u ) = min i∈P u hops(c r , c i ) and the recent variant is T axoDist rec (r , P u ) = hops(c r , c rec ) To illustrate the difference between category difference and taxonomic distance, suppose there are two items, of which item A belongs to the leaf category "Student uniform" and item B belongs to the leaf category "Work uniform." The category difference between the two items is 6 − 3 = 3, because they fork at the third level of the Taobao's item taxonomy (see Fig. 2); while their taxonomic distance is 2 because there are two hops from A's leaf category "Student uniform" to B's leaf category "Work uniform."

User characteristics
As mentioned before, we have users' basic demographic properties and psychological trait values in the Taobao Serendipity Dataset.

Demographic characteristics
As shown in Table 3, participants of the Taobao user survey were distributed among six age groups: 18-20 (3274 users), 20-30 (4701 users), 30-40 (2433 users), 40-50 (735 users), 50-60 (166 users), and above 60 years old (74 users). To simplify the analysis, we divided all users into two age groups: younger users who were less than 30 years old (70.06%) and older users who were over 30 years old (29.94%). The other general demographic property is gender. There were 7769 (about 68.25%) female participants and 3614 (about 31.75%) males. The demographic distributions regarding both age and gender are largely in line with the statistics on online shopping users in China (Mobile 2018;Center CINI 2018).

Psychological characteristics
There are different psychological characteristics considered in recommender systems, among which curiosity has been more frequently discussed under the context of serendipity-oriented recommendations (see Sect. 2.2). In addition, given that curiosity is found to significantly correlate with user personality traits, we also acquired users' personality trait values based on the popularly used Big-Five factor model (Gosling et al. 2003).

Curiosity. Curiosity is widely regarded as an important antecedent of users'
appetite for new knowledge or experience in the field of psychology (Berlyne 1960). It can greatly affect the level of pleasure a user may experience when exploring new and unexpected things. Previous research shows that users prefer items with a stimulation degree that is neither too low nor too high relative to their curiosity (Zhao and Lee 2016). In recommender systems, curiosity is found to play a significant moderating role in strengthening the relationship from novelty to serendipity and that from serendipity to satisfaction (Chen et al. 2019b). In the Taobao user survey, a popularly used curiosity quiz, i.e., the ten-item Curiosity and Exploration Inventory-II (CEI-II) (Kashdan et al. 2009), was adopted to measure whether the user has a strong desire for new knowledge or experiences (see description in Sect. 3). The mean (standard derivation) of Taobao users' curiosity scores is 3.138 (0.819) and the distribution is non-normal according to the Kolmogorov-Smirnov test ( p < 0.001; see Table 4). 2. Big-Five Personality Traits. The Big-Five factor model (also known as the OCEAN model) is a popularly used taxonomy to define personality (Gosling et al. 2003) from the following five traits: • Openness to Experience indicates a person's facets such as imagination, preference for variety, and intellectual curiosity. • Conscientiousness indicates planned rather than spontaneous behaviors. Conscientious people are dependable and self-disciplined, aiming for achievement. • Extraversion is defined as "an attitude type characterized by concentration of interest on the external object" (Jung 1983). People with high extraversion tend to be enthusiastic, outgoing, and talkative.
• Agreeableness implies facets such as trust, altruism, and tender-mindedness. People scoring high on this trait are empathetic and willing to cooperate. • Neuroticism is concerned with anxiety, hostility, depression, self-consciousness, impulsiveness, and vulnerability (Costa et al. 1992). People who score high on neuroticism are more likely to be moody, feel stressed, and have difficulty in delaying gratification.
The standard Ten-Item Personality Inventory (TIPI) was used to assess participants' Big-Five personality values as obtained in the Taobao Table 4).
6 Results analysis

Analysis methods
To identify the impact of item features on users' perceptions of recommendation serendipity, we employed two nonparametric statistical methods: Spearman's correlation and the Mann-Whitney U test, as users' responses are not normally distributed in both datasets according to the results of the Kolmogorov-Smirnov test (see Tables  2 and 4). Specifically, Spearman's correlation measures the association between two variables based on their rankings, so it is suitable for cases where users' feedback on serendipity was collected using a Likert scale that is commonly considered as ordinal response (Hildebrand et al. 1977). We also ran the Mann-Whitney U test for each feature between high and low serendipity groups as divided via a median split method (Iacobucci et al. 2015). In the Movielens dataset, there are 952 (user, item) pairs (47.2%) in the high serendipity group (rating > median 2.33) and 984 (user, item) pairs (48.7%) in the low serendipity group 5 In the Taobao dataset, there are 5389 users (47.3%; each user rated only one item) in the high serendipity group (rating > median 2.0), and 5,994 (52.7%) in the low serendipity group. We present the results of the Spearman's correlation and the Mann-Whitney U test in Sect. 6.2. For user characteristics that can be defined as categorical variables (e.g., female vs. male, younger vs. older users, highly curious vs. less curious users), we ran the Mann-Whitney U test for the two-group comparison (see results in Sect. 6.3).
We then fused all the features together into a logistic regression model to investigate the relative importance of each item feature or user characteristic in predicting the user's perceived serendipity. The results are given in Sect. 6.4.
To further investigate the influence of user profile length, we ran both Spearman's correlation and the Mann-Whitney U test by changing the profile length from 1 week to maximal 12 weeks (i.e., three months). The results are shown in Sect. 6.5.  (1) ** p < 0.01 and * p < 0.05 for both Spearman's correlation and the Mann-Whitney U test.
(2) For the Mann-Whitney U test, the group with higher value is marked (L for the low serendipity group and H for the high serendipity group)

Impact of item features
From the results in Table 6, we can see that the significance and direction (positive/negative) of Spearman's correlations are basically consistent with the results of the Mann-Whitney U test in each dataset. We also notice that the absolute correlation values are not high, which might be because the large-scale dataset with ordinal data (on a 5-Likert scale) causes the coarse-grained values. Therefore, in the following, we present findings from both tests.

Item-driven Feature
• Popularity. In both the Movielens and Taobao datasets, popularity is significantly negatively correlated with users' perceived serendipity (e.g., corr. = −0.118, p < 0.01 in the Movielens dataset and corr. = −0.138, p < 0.01 in the Taobao dataset). The Mann-Whitney U test reveals that the two groups, high and low serendipity groups, are significantly different in terms of the item popularity (e.g., mean = 0.0124 vs. 0.0191, p = 0.000 in the Movielens dataset; and 0.1895 vs. 0.3041, p = 0.000 in the Taobao dataset). The results hence suggest that less popular items are more likely to be perceived as serendipitous by users.

Profile-driven Feature
• Profile size. In the Movielens dataset, the relationship between profile size and users' perceived serendipity is not significant. But in the Taobao dataset, results show that users with a larger profile size perceived less serendipity (corr. = −0.036, p < 0.01 in the Taobao clicked-based dataset and corr. = −0.051, p < 0.01 in the purchase-based dataset). Thus, for e-commerce recommendations, the more active the user is, the more difficult it might be for the user to feel serendipitous. This is reasonable since active users may have more knowledge and a broader perspective on various items. • In-profile diversity. In both the Movielens dataset and the Taobao click-based dataset, the results of Spearman's correlation show that in-profile diversity has a significantly positive correlation with users' perceived serendipity (corr. = 0.051, p < 0.01 in the Movielens dataset, and corr. = 0.021, p < 0.05 in the Taobao click-based dataset). However, in the Taobao purchase-based dataset, in-profile diversity is negatively correlated with users' perceived serendipity (corr. = −0.029, p < 0.01). This difference may be explained by the fact that movies are essentially for entertainment consumption (i.e., dominantly hedonic and aesthetic) (Oliver and Raney 2011;Cooper-Martin 1991), while online shopping is mainly for practical consumption (i.e., to fulfill utilitarian functions) (Cooper-Martin 1991; Ford et al. 1988) (see Sect. 7 for more discussion). Hence, the interpretation is that for movie recommendations, the more diverse the movies watched before, the more likely that the user would perceive a new recommendation to be serendipitous. However, for e-commerce recommendations, the more diverse the purchases made, the more difficult it is for the user to feel a new recommendation to be serendipitous.

Category-level Interaction-driven Features:
• Interaction. This feature reflects a user's preference for a certain category. In the Movielens dataset, results of both the Spearman's correlation and the Mann-Whitney U test indicate a significantly negative relationship with users' perceptions of serendipity (e.g., corr = −0.072, p < 0.01), inferring that movie categories that users have interacted less with are more likely to be serendipitous to them. However, the direction is the opposite in the Taobao dataset. The correlations between interaction and users' perceived serendipity in both the click-based dataset and the purchase-based dataset are positive (corr. = 0.057 and 0.032, respectively, p < 0.01), inferring that in an e-commerce recommender system, users are more likely to be pleasantly unexpected when they receive recommendations of

categories that they have frequently interacted with. Thus, the effects of categorylevel interaction on users' serendipity perceptions can be different in two product domains.
• Temporal interaction. The results are similar between the two datasets, indicating that recommending an item from a category that users have frequently accessed at the same time point in the past (the day of the week or the time of the day) are more likely to increase users' perceptions of serendipity. Specifically, in the Movielens dataset, Spearman's correlation is significant regarding Day-of-theweek interaction (corr. = 0.075, p < 0.01). In the Taobao dataset, the correlations between users' serendipity perceptions and Day-of-the-week interaction/Time-ofthe-day interaction are all significant (corr. = 0.084 and 0.043 with regard to Day-of-the-week interaction in the click-based dataset and the purchase-based dataset, respectively; and corr. = 0.080 and 0.047 with regard to Time-of-the-day interaction). This suggests that recommendation of items in a timely way can lead users to consider it to be more serendipitous. • Time difference. In the Movielens dataset, time difference is positively correlated with users' perceived serendipity (corr. = 0.048, p < 0.05), indicating that the more recently the current recommendation's category has been visited by the target user, the less likely the user is to consider it serendipitous. However, in the Taobao dataset, the results of both the Spearman's correlation and the Mann-Whitney U test show that if the category of the currently recommendation has been recently clicked or purchased, it may be more serendipitous to the user. In the click-based Taobao dataset the mean of time difference is 9.24 days in the high serendipity group vs. 12.12 days in the low group (corr. = −0.106, p < 0.01); and in the purchase-based Taobao dataset it is 76.18 days vs. 79.54 days (corr. = −0.123, p < 0.01). The finding in the Taobao dataset contradicts our common assumption that an unexpected item would be from a category that the user has not recently accessed (Kotkov et al. 2016).

Thus, this infers that the effects of time difference on users' serendipity perceptions can be different in two product domains.
A short summary. Through the above analyses, we find: 1). the effects of two temporal interaction features (i.e., day-of-the-week interaction and time-of-the-day interaction) are similar in the two datasets, i.e., timely recommendations in line with the user's preferences are likely to be perceived as serendipitous. 2). Interaction and time difference affect users' serendipity perceptions in different ways for movie recommendations and e-commerce recommendations. Specifically, users are more likely to perceive serendipity in movie categories that they have rarely watched (i.e., low interaction) or have not watched for a long time (i.e., large time difference); whereas, when shopping online, users will be likely to perceive serendipity in products from categories they have frequently or recently visited. Item-level Interaction-driven Features: • Content-based distance (min/rec). There is no significant relationship between minimal/recent content-based distance and users' serendipity perceptions in the Movielens dataset, but in the Taobao dataset both the results of the Spearman's correlation and the Mann-Whitney U test show that items with shorter minimal/recent content-based distance are more likely to be perceived as highly serendipitous (corr. = −0.079 and −0.022, respectively, in the click-based and purchasebased datasets, p < 0.

01). Thus, recommending an e-commerce item with a shorter minimal/recent content-based distance to the target user's profile can help increase its perceived serendipity. • Collaborative-based distance (min/rec). Both the two variants of collaborative-
based distance act differently between the two datasets. It shows that the larger the distance (min/rec) is, the more serendipitous the item is to users in the Movielens dataset (e.g., corr. = 0.104, p < 0.01 w.r.t. minimal collaborative-based distance and corr. = 0.098, p < 0.01 w.r.t. recent collaborative-based distance), but the opposite is the case in the Taobao dataset (corr. = −0.113 and -0.046, p < 0.01, respectively, w.r.t. the minimal and the recent collaborative-based distances in the Taobao click-based dataset). Thus, the effects of minimal/recent collaborativebased distance on users' serendipity perceptions can be different in two product domains.

• PMI distance (min/rec). Minimal PMI distance takes effect only in the click-based
Taobao dataset. The results show that items with a smaller minimal PMI distance to the user profile are more likely to be perceived as high serendipity, implying that in the e-commerce domain a recommendation that has been more frequently observed with items in the user's profile will be more serendipitous to the user, which is different from our assumption. For the recent PMI distance, in the Movielens dataset, it has a positive correlation with users' perceived serendipity (corr. = 0.049, p < 0.05), while the opposite is still the case in the Taobao click-based dataset (corr. = −0.047, p < 0.01). Thus, the effects of recent PMI distance on users' serendipity perceptions can be different in two product domains. • Category difference (min/rec). Features of this type, and also of the next one (taxonomic distance (min/rec)), are only applicable to the Taobao dataset for which a hierarchical item taxonomy is available (see Figure 2). It shows that the minimal category differences (regarding both click-based and purchase-based profiles) in the high serendipity group are significantly shorter than those in the low group (e.g., mean = 0.4118 vs. mean = 0.4445 in the click-based dataset, p < 0.01), implying that the serendipitous recommendation is not necessarily of a bigger category difference from those the user has interacted with. Similar results are obtained regarding the recent category difference. Thus, the minimal/recent category difference of a highly serendipitous item from the user profile can be shorter. • Taxonomic distance (min/rec). This refers to the number of hops from one item's leaf category to that of another item. It is interesting to find that the correlations between users' serendipity perceptions and both minimal and recent taxonomic distances in the click-based dataset are significantly negative (corr. = −0.076 and −0.053, respectively, p < 0.01), indicating that the minimal/ recent taxonomic distance from the highly serendipitous item to the user profile is also shorter (i.e., traversing fewer nodes).
A short summary. We find that the movies that the user is not familiar with (based on minimal collaborative-based distance) or that are different from what the user recently watched (based on recent collaborative-based/PMI distance) will bring more serendipity. For e-commerce recommendations, users tend to perceive recommendations more relevant to their overall preferences (e.g., with shorter minimal content-based distance) or more relevant to their recent interests (e.g., with shorter recent content-based/PMI distance) as being of greater serendipity.

Impact of user characteristics
This part of the analysis was only conducted in the Taobao dataset that contains information of user characteristics.

Demographic Characteristics
• Age. Because both age groups (younger and older) and gender (female and male) were not balanced (see Sect. 3.1), we conducted the Mann-Whitney U test to analyze their effects as it makes no assumption about the data distribution (Wikipedia Contributors 2022). The results show that the serendipity level perceived by younger users is significantly lower than that of older users, implying that older people are more inclined to feel a recommendation to be serendipitous. • Gender. For gender, we find that male users gave significantly higher serendipity scores on recommendations than female users did (2.79 vs. 2.59, p < 0.001). This infers that men are more inclined to feel a recommendation to be serendipitous.

Psychological Characteristics
• Curiosity. We first applied the median split method (Iacobucci et al. 2015) to divide all users into two curiosity groups (i.e., high and low). The results show that people with higher curiosity is likely to perceive a recommendation to be more serendipitous (see Table 7), validating the previous findings that highly curious users are more likely to enjoy novel and unexpected items (Kotkov et al. 2016;Chen et al. 2019b). • Big-Five personality traits. In respect of each Big-Five personality trait, we split all users into two groups, e.g., high Openness and low Openness, still through the median split method. It shows people of high Openness to Experience, Conscientiousness, Extraversion, or Neuroticism gave significantly higher scores on item serendipity, while those of high Agreeableness gave significantly lower scores. We further calculated the correlations between user curiosity and the Big-Five personality traits. Openness has the largest positive correlation with curiosity (corr. = 0.410), followed by Extraversion (0.276), Conscientiousness (0.157), and Neuroticism (0.117), while Agreeableness has a negative correlation with curiosity (−0.175). This infers that highly curious users, who by nature are more open, more conscientious, more extrovert, and more neurotic, are more inclined to perceive a recommendation as highly serendipitous.

Relative importance of features
So far we have analyzed the effects of individual item features and user characteristics. It should be interesting to identify their relative importance in influencing users' serendipity perceptions when being considered together. To this end, we built an ordinal logistic regression model that involves all the item features (and user characteristics for the Taobao dataset) as independent variables and users' serendipity perceptions as the dependent variable: where F denotes the set of item features, and C denotes the set of user characteristics if any. Results are given in Table 8, from which we can see that, in the Movielens dataset, popularity, day-of-the-week interaction, and interaction are three significant predictors (among all considered item features) of serendipity. Among them, the importance of popularity is the highest (odds ratio = 0.821), followed by day-of-the-week interaction (odds ratio = 1.135) and interaction (odds ratio = 0.885) 6 In the click-based Taobao dataset, there are nine item features and six user characteristics (out of a total of 25 features) that are significant predictors of users' serendipity perceptions. Among them, curiosity has the largest importance, followed by age, recent taxonomic distance, popularity, and recent content-based distance. In the purchasebased Taobao dataset, eight item features and six user characteristics are significant predictors, among which curiosity is still the most important, followed by age, min- Table 8 The relative importance of considered features (number in bracket indicates the importance ranking position, which is based on the absolute difference from 0, i.e., the magnitude of the influence)

Feature
Movielens Taobao  Only features with significant coefficients are listed in the table. The reported value is the odds ratio E X P(B) (Pantouvakis 2010) with significance adjusted by the Bonferroni correction: * * * p < 0.001, * * p < 0.01, * p < 0.05 imal content-based distance, minimal category difference, and minimal taxonomic distance.
There are two major observations regarding the Taobao dataset: 1). Considering the results reported in the previous section (see Table 6), it can be seen that although some item features (e.g., profile size and time-of-the-day interaction) have significant relationships with serendipity, the effects can be weak compared to other features (e.g., popularity and recent content-based distance) in the relative importance analysis. 2). Compared with profile-driven and interaction-driven item features, user characteristics (especially curiosity and age) and item-driven features (i.e., popularity) appear to be more important predictors of users' serendipity perceptions.

Influence of profile length
Another question we were interested in addressing is how the results vary in different lengths of user profile, given that profile-driven and interaction-driven item features can both be affected by the size of user profile (e.g., a 1-week vs. 12-week profile). For this purpose, we calculated Spearman's correlations for 12 different lengths ranging from one to 12 weeks (maximally three months in our datasets). The results are shown in Table 9. Note that: 1). Time difference is not considered because it only considers the last time that the user accessed the same category as that of the recommendation. 2). The recent variants of item-level interaction-driven features are neither taken into account because they only consider the most recent interaction in a user profile. 3). Only features that have significant results in at least one profile length setting are illustrated in the figures.
It can be seen that the correlations between interaction-driven item features and users' serendipity perceptions are stronger with a short-term user profile (e.g., a 3-week profile) than those with a long-term profile (e.g., a 12-week profile). For example, the correlation coefficients of category-level interaction-driven features (such as interaction and time-of-the-day interaction in the Movielens dataset, and interaction, day-of-the-week interaction, and time-of-the-day interaction in both the Taobao clickbased and purchase-based datasets) with users' perceived serendipity decrease with the increase of profile length (e.g., from corr. = 0.100 using the 1-week user profile to corr. = 0.057 with the 12-week profile, regarding interaction in the Taobao clickbased dataset). Moreover, in the Movielens dataset, the performances of category-level interaction-driven features are more stable than item-level interaction-driven features. For example, the feature interaction keeps significant correlations with users' perceptions of serendipity, no matter which profile length is considered. However, as the length becomes longer, the correlation between the feature minimal PMI distance and user serendipity perception first changes from nonsignificant to significant (in the 5-week profile), and then back to nonsignificant (in the 9-week profile). The latter fluctuation is unfavorable in practical applications, because when the data are changed, the feature's validity may not be guaranteed. In the Taobao dataset, all the interactiondriven features (both category-level and item-level) are significantly correlated with users' perceived serendipity regardless of the profile length.
However, with the profile-driven features (profile size and in-profile diversity), the longer the profile length is, the richer the information there can be, so the correlation may become stronger. For example, in the Movielens dataset, in-profile diversity is significantly correlated with user serendipity perception when using the 11-week or 12-week profile (corr. = 0.046 and 0.051, respectively, p < 0.05), but there is no significant relationship when the profile length is shorter than 11 weeks. In the Taobao dataset, especially when using the click-based profile, the advantage of using longterm profiles is more obvious. The correlation coefficient w.r.t. in-profile diversity grows as the profile becomes longer, and reaches the maximum value of 0.021 when using the 12-week data.
A short summary. The results show that the shorter the profile is, the more strongly category-level interaction-driven features correlate with users' perceived serendipity, but for the profile-driven item feature in-profile diversity, the longer the profile is, more strongly it correlates with users' perceived serendipity.

Major findings
In this section, we summarize our major findings in response to the research questions raised at the beginning. This study investigated the effects of a number of item features on user perceptions of recommendation serendipity with two user survey datasets: the Movielens Serendipity Dataset (Kotkov et al. 2018a) containing 467 users' serendipity feedback and the Taobao Serendipity Dataset (Chen et al. 2019b;Wang et al. 2020) with 11,383 users' feedback. The analyses disclose domain-independent and domain-specific observations (see the summary of all major observations in Table 10). Specifically, we find that some features have consistent results across the two datasets: (1) Lower item popularity is significantly correlated with higher recommendation serendipity as perceived by the user. (2) The results about day-of-the-week interaction and time-of-the-day interaction show that recommendations that meet users' daily or weekly behavior patterns may lead to a sense of serendipity.
Several features, however, have different effects in the two datasets: (1). Users are more likely to experience serendipity from movies in categories that they have rarely watched (i.e., lower interaction) or have not watched for a long time (i.e., larger time difference), or movies that differ from the previously visited items (i.e., longer minimal collaborative-based distance) or the most recently interacted one (i.e., longer collaborative-based/PMI distance).
(2) In the e-commerce domain, users experience more serendipity from products in categories that they have frequently (i.e., higher interaction) or recently (i.e., smaller time difference) clicked or purchased, or from products similar to their previously clicked or purchased items (i.e., smaller minimal content-based distance / collaborative-based distance / PMI distance / category difference / taxonomic distance) or similar to their most recent interests (i.e., smaller recent content-based distance / collaborative-based distance / PMI distance / category difference / taxonomic distance).
These differences imply that users may place different emphases on the two components of serendipity, i.e., relevance and unexpectedness, in different product domains. To be specific, it seems that users focus more on the unexpectedness component when choosing movies, and more on the relevance component when searching for ecommerce products. The difference might be explained from multiple perspectives: (1) According to Maslow's hierarchy of needs (Maslow 1943), humans' needs can be classified into five levels from physiological needs (e.g., food and water), safety needs (e.g., security), belongingness and love needs (e.g., family and friends), esteem needs (e.g., feelings of accomplishment), to self-actualization (e.g., conducting creative activities). In the case of movie recommendations (part of fulfilling users' self-actualization needs), users might be inclined to explore more diverse items for obtaining new experiences, while for e-commerce products that mainly serve their daily physiological and safety needs, users might have lower tolerance to less relevant recommendations.
(2) From the aspect of product type, movies are mainly experience products that are evaluable after consumption and experience, while the majority of e-commerce prod-

Popularity
Less popular items are more likely to be perceived as serendipitous.

Profile-driven feature (Movielens and Taobao datasets)
Profile size For e-commerce recommendations, the more interactions with items in the user's profile, the less likely she would perceive a new recommendation to be serendipitous.

In-profile diversity
For movie recommendations, the more diverse the movies watched before, the more likely that the user would perceive a new recommendation to be serendipitous. However, for e-commerce recommendations, the more diverse the purchases made, the more difficult it is for the user to feel a new recommendation to be serendipitous.

Category-level (Movielens and Taobao datasets)
Interaction For movie recommendations, movie categories that users have interacted less with are more likely to be serendipitous to them. However, the opposite is the case for e-commerce recommendations.

Temporal interaction
Recommendation of items in a timely way can lead users to consider it to be more serendipitous.

Time difference
For movie recommendations, the more recently the current recommendation's category has been visited by the target user, the less likely the user is to consider it serendipitous. However, the opposite is the case for e-commerce recommendations.

Content-based distance (min/rec)
Recommending an e-commerce item with a shorter minimal/recent content-based distance to the target user's profile can help increase its perceived serendipity.
Collaborative-based distance (min/rec) For movie recommendations, the larger the distance is, the more serendipitous the item is to users. However, the opposite is the case for e-commerce recommendations.

PMI distance (min)
An e-commerce recommendation that is of higher probability to be observed together with items in the user's profile will be more serendipitous to the user. Table 10 continued

PMI distance (rec)
A movie recommendation that is of lower probability to be observed together with the most recent item in the user's profile will be more serendipitous to the user, but the opposite is the case for e-commerce recommendations.
#Category difference (min/rec) The minimal/recent category difference of a highly serendipitous item from the user profile is shorter.
#Taxonomic distance (min) The minimal taxonomic distance from the highly serendipitous item to the click-based user profile is shorter.
#Taxonomic distance (rec) The recent taxonomic distance from the highly serendipitous item to the click-based/purchase-based user profile is shorter.
#User characteristic (applicable only to Taobao dataset)

Demographic characteristic
Age Older people are more inclined to feel a recommendation to be serendipitous.

Gender
Men are more inclined to feel a recommendation to be serendipitous.

Curiosity
Highly curious users are more likely to enjoy serendipitous items.

Big-Five personality traits
People who are of high Openness to Experience, Conscientiousness, Extraversion, or Neuroticism, or low Agreeableness are more inclined to perceive a recommendation as highly serendipitous.
ucts (e.g., clothing) are search products that can be evaluated after online searching and comparison (Nelson 1970). Therefore, the relevance of recommendations to user preference as reflected by the history may be more important for e-commerce users than for movie users before they make a decision.
(3) It can also be explained by the difference between hedonism and utilitarianism (Oliver and Raney 2011;Cooper-Martin 1991;Ford et al. 1988). The former refers to the sensual self-indulgence, focusing on personal pleasure, and therefore considers fewer reality constraints, while the latter focuses on the overall utility and therefore considers different reality constraints. From this point of view, when selecting and evaluating movie recommendations, users might consider more about their emotional experience at that moment, while when evaluating e-commerce recommendations, the price/quality ratio of that item might be important. Moreover, with the Taobao Serendipity Dataset, we compared the results based on click-based user profiles and purchase-based profiles. The results indicate that, other than the profile-driven feature (i.e., in-profile diversity), other features that are sig-nificantly correlated with users' serendipity perceptions behave similarly in the two types of profile datasets. For example, the correlations of day-of-the-week interaction with users' perceived serendipity in both datasets are positive (0.084 and 0.043, respectively, p < 0.01). This also indicates that regarding in-profile diversity users who purchased less diverse categories tend to perceive higher serendipity. However, this is less obvious in the click-based dataset. It hence suggests that diversity might be better measured by using their purchase records given that clicking behavior could be aimless when shopping (Wikipedia Contributors 2021).
In the Taobao dataset, we were also able to analyze the relationships between several users' characteristics and their perceived sense of serendipity (see Table 10). Both age and gender induced significant effects: older (above 30 years old) or male users are more likely to feel a recommendation to be serendipitous, while younger or female users are harder to feel unexpected. In addition, our results verify that users with higher Curiosity/Openness/Conscientiousness/ Extraversion/Neuroticism, or lower Agreeableness, are more likely to give a higher score on the recommendation's serendipity. The reason for this may be that the five personality traits are all related to curiosity to a certain extent (Kashdan et al. 2009).
We further analyzed the relative importance of all considered features (including user characteristics) showing that 1). In both Movielens and Taobao datasets, item popularity is relatively of high importance, and 2). in the Taobao dataset containing both item features and user characteristics, user characteristics especially Curiosity and Age have greater importance than item features. Moreover, no matter which type of user profile (click-based or purchase-based) was considered, five user characteristics (i.e., Age, Gender, Curiosity, Openness, Conscientiousness, and Neuroticism) and five item features (i.e., item popularity, content-based distance (rec), category difference (min), time difference, and profile size) are all significant predictors of users' perceived serendipity.
In addition, the analysis of the effect of profile length (varying from 1 to 12 weeks) reveals that in both the Movielens and Taobao datasets, interaction-driven features (especially interaction, day-of-the-week interaction, content-based distance, collaborative-based distance, and PMI distance) have stronger correlations with users' perceived serendipity when using a shorter-term profile. This could be because factors leading to serendipity may change from time to time and therefore recent interactions are more important to the calculation of interaction-driven features. However, three features were differently affected by the profile length in the Movielens and Taobao datasets. Profile size and time-of-the-day interaction, which are correlated with users' serendipity perceptions with a certain length of user profile in the Taobao dataset, have no significant correlation in the Movielens dataset. In-profile diversity is significantly correlated with users' serendipity perceptions in the Movielens dataset only when a longer-term profile was used (i.e., 11-week or 12-week profile); whereas in the click-based Taobao dataset, it shows stronger correlation as the profile gets longer.

Practical implications
The findings of this study could be constructive in the design of more effective serendipity-oriented recommendation algorithms. For instance, item popularity is a domain-independent feature that negatively correlates with users' perceived serendipity in both the Movielens and Taobao datasets. This suggests that the methods with the objective of achieving recommendation serendipity based on this feature (Lu et al. 2012a;Kawamae 2010;Zheng et al. 2015) could be generalized to different domains (at least movies and e-commerce products). Moreover, some special features considered in this work, such as day-of-the-week interaction that is also domain-independent in terms of influencing users' serendipity perceptions, could be usefully incorporated into a classical recommendation framework to strengthen its serendipity. For example, the method based on SVD++ to generate serendipitous recommendations (Lu et al. 2012b;Zheng et al. 2015) could further be enhanced by adding a vector representing the user's day-of-the-week preference for each latent factor.
On the other hand, care should be taken in utilizing domain-specific features. For instance, although in both datasets we considered a fixed 3-month user profile to extract those item features, users' activity degrees actually varied between the two datasets. Taobao users were more active than Movielens users according to the average number of visited items. This may partially explain why some features, such as time-of-the-day interaction measuring the frequency that the user has interacted with items on the same day as the target item, give significant correlation with users' serendipity perceptions in the Taobao dataset, but not in the Movielens dataset. However, other features such as time difference that do not rely on the activity degree might be directly used to build domain-specific recommendation algorithms for reflecting users' serendipity preferences.
Our findings also verified the impacts of several user characteristics, especially curiosity and big-five personality traits, on users' perceptions of recommendation serendipity. This implies that related methods (Shen et al. 2020;Kotkov et al. 2016) based on such characteristics might achieve the serendipity objective from the users' perspective. However, because our validation was done with only one dataset due to the lack of user data in other domains, more studies might be needed when employing those methods in real situations.

Limitations of our work
There are two major limitations of this work. Firstly, although the study investigated 11 item features and 8 user characteristics, the findings suggest that the interaction between some features, such as time difference and category difference, are worth being investigated in-depth, because when they are considered together, the combined effect may be stronger than that of treating them separately. Moreover, for user characteristics, our study is limited to the most concerning factors, i.e., curiosity and Big-Five personality traits. In the future, other characteristics such as users' decision-making styles (Sprotles and Kendall 1986) and need for cognition (Cacioppo and Petty 1982) deserve further investigation. Another limitation is that, because of the limited data size, we only measured the influence of profile length up to 12 weeks, but as the results of time-of-the-day interaction imply, its correlation with users' serendipity perceptions might further increase with the increased profile length.
Secondly, the two datasets, although collected in two different product domains, differ in terms of other potential factors such as users' personal background, suggesting that those identified domain differences might not be simply explained by the domain specialty itself. Besides, the Movielens dataset was based on a retrospective survey, which may not precisely indicate users' perception of serendipity on site. Ideally, the comparison should be done with the same group of users when they are using a recommender system, but as this is not feasible now due to the unavailability of such datasets, we expect that more studies could be done in the future to address this issue.

Conclusions
In order to understand how item features and user characteristics may affect users' perceptions of recommendation serendipity, in this study we analyzed two publicly accessible user survey datasets: the Movielens Serendipity Dataset (Kotkov et al. 2018a) and the Taobao Serendipity Dataset (Chen et al. 2019b;Wang et al. 2020). Through various statistical analyses, we have not only identified the significant effects of several types of features (i.e., item-driven, profile-driven, and interaction-driven), but also analyzed all of the features' relative importance, as well as the effects of user characteristics in the Taobao dataset and the influence of user profile length in both datasets.
Concretely, in response to the three research questions we raised at the beginning, there are several interesting findings: (1) Regarding what item features can be correlated with users' perceived serendipity, we found that two particular item features (i.e., popularity and temporal interaction) perform similarly between the Movielens and Taobao datasets. However, six features' effects are opposite in the two domains (i.e., inprofile diversity, interaction, time difference, minimal collaborative-based distance, recent collaborative-based distance, and recent PMI distance), and four only take significant effects in one dataset (i.e., profile size, minimal content-based distance, recent content-based distance, and minimal PMI distance for e-commerce recommendations). We further investigated the differences between the two types of Taobao dataset, i.e., click-based and purchase-based, in terms of those features' exact roles.
(2) As for what user characteristics can affect their perceived serendipity that was only analyzable in the Taobao dataset, we found that older people, men, or more curious users (who are essentially more open, more conscientious, more extrovert, and more neurotic in terms of their Big-Five personality traits) are more inclined to perceive a recommendation as highly serendipitous. (3) When putting all of the considered features together to predict users' serendipity perception, it shows that, in the Movielens dataset, popularity, day-of-the-week interaction, and interaction are three significant predictors; while curiosity, age, and popularity are more predictive in the Taobao dataset. 4). Moreover, the features' correlation with users' serendipity perception can vary in different lengths of user profile. In particular, it shows that the shorter the profile is, more strongly interaction-driven features (e.g., interaction and time-of-the-day interaction) correlate with users' perceived serendipity, but for longer user profiles, the profile-driven item feature (i.e., profile size or in-profile diversity) is more strongly correlated with users' perceived serendipity.
We discussed the reasons behind those findings and their practical implications to develop more effective serendipity-oriented recommendation algorithms in consideration of the dataset's domain properties. In the future, we will be engaged in addressing the limitations noted in Sect. 7.3. We will also be interested in incorporating those effective features into the implementation of a real serendipity-oriented recommender system and measuring the system's practical performance in comparison with the state-of-the-art methods.