Cold start (recommender systems)

The cold start problem is a well known and well researched problem for recommender systems. Recommender systems form a specific type of information filtering (IF) technique that attempts to present information items (e-commerce, films, music, books, news, images, web pages) that are likely of interest to the user. Typically, a recommender system compares the user's profile to some reference characteristics. These characteristics may be related to item characteristics (content-based filtering) or the user's social environment and past behavior (collaborative filtering). Depending on the system, the user can be associated to various kinds of interactions: ratings, bookmarks, purchases, likes, number of page visits etc. There are three cases of cold start: This raises another issue, which is not anymore related to new items, but rather to unpopular items. In some cases (e.g. movie recommendations) it might happen that a handful of items receive an extremely high number of interactions, while most of the items only receive a fraction of them. This is referred to as popularity bias. In the context of cold-start items the popularity bias is important because it might happen that many items, even if they have been in the catalogue for months, received only a few interactions. This creates a negative loop in which unpopular items will be poorly recommended, therefore will receive much less visibility than popular ones, and will struggle to receive interactions. While it is expected that some items will be less popular than others, this issue specifically refers to the fact that the recommender has not enough collaborative information to recommend them in a meaningful and reliable way. Content-based filtering algorithms, on the other hand, are in theory much less prone to the new item problem. Since content based recommenders choose which items to recommend based on the feature the items possess, even if no interaction for a new item exist, still its features will allow for a recommendation to be made. This of course assumes that a new item will be already described by its attributes, which is not always the case. Consider the case of so-called editorial features (e.g. director, cast, title, year), those are always known when the item, in this case movie, is added to the catalogue. However, other kinds of attributes might not be e.g. features extracted from user reviews and tags. Content-based algorithms relying on user provided features suffer from the cold-start item problem as well, since for new items if no (or very few) interactions exist, also no (or very few) user reviews and tags will be available. New user The new user case refers to when a new user enrolls in the system and for a certain period of time the recommender has to provide recommendation without relying on the user's past interactions, since none has occurred yet. This problem is of particular importance when the recommender is part of the service offered to users, since a user who is faced with recommendations of poor quality might soon decide to stop using the system before providing enough interaction to allow the recommender to understand his/her interests. The main strategy in dealing with new users is to ask them to provide some preferences to build an initial user profile. A threshold has to be found between the length of the user registration process, which if too long might induce too many users to abandon it, and the amount of initial data required for the recommender to work properly. Similarly to the new items case, not all recommender algorithms are affected in the same way. Item-item recommenders will be affected as they rely on user profile to weight how relevant other user's preferences are. Collaborative filtering algorithms are the most affected as without interactions no inference can be made about the user's preferences. User-user recommender algorithms behave slightly differently. A user-user content based algorithm will rely on user's features (e.g. age, gender, country) to find similar users and recommend the items they interacted with in a positive way, therefore being robust to the new user case. Note that all these information is acquired during the registration process, either by asking the user to input the data himself, or by leveraging data already available e.g. in his social media accounts. ==Mitigation strategies==