Collaborative filtering One approach to the design of recommender systems that has wide use is
collaborative filtering. Collaborative filtering is based on the assumption that people who agreed in the past will agree in the future, and that they will like similar kinds of items as they liked in the past. The system generates recommendations using only information about rating profiles for different users or items. By locating peer users/items with a rating history similar to the current user or item, they generate recommendations using this neighborhood. This approach is a cornerstone for e-commerce sites that analyze the purchasing patterns of thousands of users to suggest what you might like. Collaborative filtering methods are classified as memory-based and model-based. A well-known example of memory-based approaches is the user-based algorithm, while that of model-based approaches is
matrix factorization (recommender systems). A key advantage of the collaborative filtering approach is that it does not rely on machine analyzable content and therefore it is capable of accurately recommending complex items such as movies without requiring an "understanding" of the item itself. Many algorithms have been used in measuring user similarity or item similarity in recommender systems. For example, the
k-nearest neighbor (k-NN) approach and the
Pearson Correlation as first implemented by Allen. When building a model from a user's behavior, a distinction is often made between explicit and
implicit forms of
data collection. Examples of explicit data collection include the following: • Asking a user to rate an item on a sliding scale. • Asking a user to search. • Asking a user to rank a collection of items from favorite to least favorite. • Presenting two items to a user and asking him/her to choose the better one of them. • Asking a user to create a list of items that he/she likes (see
Rocchio classification or other similar techniques). Examples of
implicit data collection include the following: • Observing the items that a user views in an online store, media library, or other repository of media. • Analyzing item/user reading/viewing/listening times (often formalized as
dwell time). • Keeping a record of the items that a user purchases (or considers purchasing) online. • Obtaining a list of items that a user has read, listened to, watched on his/her computer. • Analyzing the user's social network and discovering patterns of likes and dislikes. These implicit signals, particularly continuous variables like dwell time, are increasingly central to evaluating user satisfaction. For instance,
YouTube adopted dwell time as its dominant ranking coefficient in 2012 to optimize its video recommendations. Advanced algorithmic models build upon this by mathematically refining how results are compiled based on the precise duration a user interacts with an individual media item. Machine learning systems, though advancing quickly, do not always or persistently outperform mathematical models built on the variables described above. Such systems when deployed to replace traditional algorithmic recommender systems must be carefully monitored and tuned for application-specific risks and are rarely
plug-and-play. Common problems with current AI-driven recommender models include epoch reference lag (frequently studied in algorithmic literature as temporal concept drift, such as recommending an item the user already purchased), taste taxonomy errors (resulting in over-specialization or semantic mismatch, like assuming a customer who likes red lipstick will desire a red coffeemaker), and causality misattribution (where algorithms lock onto spurious correlations, such as assuming because the last four items the user bought were made in China, she only wants to see products made in China). Collaborative filtering approaches often suffer from three problems:
cold start, scalability, and sparsity. •
Cold start: For a new user or item, there is not enough data to make accurate recommendations. Note: one commonly implemented solution to this problem is the
multi-armed bandit algorithm. Many
social networks originally used collaborative filtering to recommend new friends, groups, and other social connections by examining the network of connections between a user and their friends. Collaborative filtering is still used as part of hybrid systems. This technique can employ embeddings, a machine learning technique.
Content-based filtering Another common approach when designing recommender systems is
content-based filtering. Content-based filtering methods are based on a description of the item and a profile of the user's preferences. These methods are best suited to situations where there is known data on an item (name, location, description, etc.), but not on the user. Content-based recommenders treat recommendation as a user-specific classification problem and learn a classifier for the user's likes and dislikes based on an item's features. In this system, keywords are used to describe the items, and a
user profile is built to indicate the type of item this user likes. In other words, these algorithms try to recommend items similar to those that a user liked in the past or is examining in the present. It does not rely on a user sign-in mechanism to generate this often temporary profile. In particular, various candidate items are compared with items previously rated by the user, and the best-matching items are recommended. This approach has its roots in
information retrieval and
information filtering research. To create a
user profile, the system mostly focuses on two types of information: • A model of the user's preference. • A history of the user's interaction with the recommender system. Basically, these methods use an item profile (i.e., a set of discrete attributes and features) characterizing the item within the system. To abstract the features of the items in the system, an item presentation algorithm is applied. A widely used algorithm is the
tf–idf representation (also called vector space representation). The system creates a content-based profile of users based on a weighted vector of item features. The weights denote the importance of each feature to the user and can be computed from individually rated content vectors using a variety of techniques. Simple approaches use the average values of the rated item vector while other sophisticated methods use machine learning techniques such as
Bayesian Classifiers,
cluster analysis,
decision trees, and
artificial neural networks in order to estimate the probability that the user is going to like the item. A key issue with content-based filtering is whether the system can learn user preferences from users' actions regarding one content source and use them across other content types. When the system is limited to recommending content of the same type as the user is already using, the value from the recommendation system is significantly less than when other content types from other services can be recommended. For example, recommending news articles based on news browsing is useful. Still, it would be much more useful when music, videos, products, discussions, etc., from different services, can be recommended based on news browsing. To overcome this, most content-based recommender systems now use some form of the hybrid system. Content-based recommender systems can also include opinion-based recommender systems. In some cases, users are allowed to leave text reviews or feedback on the items. These user-generated texts are implicit data for the recommender system because they are potentially rich resources of both feature/aspects of the item and users' evaluation/sentiment to the item. Features extracted from the user-generated reviews are improved
metadata of items, because as they also reflect aspects of the item like metadata, extracted features are widely concerned by the users. Sentiments extracted from the reviews can be seen as users' rating scores on the corresponding features. Popular approaches of opinion-based recommender system utilize various techniques including
text mining,
information retrieval,
sentiment analysis (see also
Multimodal sentiment analysis) and
deep learning.
Hybrid recommendations approaches Most recommender systems now use a hybrid approach, combining
collaborative filtering, content-based filtering, and other approaches. E-commerce platforms frequently use hybrid approaches to overcome problems like the cold start problem, where a new user has no history for collaborative filtering to analyze. There is no reason why several different techniques of the same type could not be hybridized. Hybrid approaches can be implemented in several ways: by making content-based and collaborative-based predictions separately and then combining them; by adding content-based capabilities to a collaborative-based approach (and vice versa); or by unifying the approaches into one model.
Netflix is a good example of the use of hybrid recommender systems. The website makes recommendations by comparing the watching and searching habits of similar users (i.e., collaborative filtering) as well as by offering movies that share characteristics with films that a user has rated highly (content-based filtering). Some hybridization techniques include: •
Weighted: Combining the score of different recommendation components numerically. •
Switching: Choosing among recommendation components and applying the selected one. •
Mixed: Recommendations from different recommenders are presented together to give the recommendation. •
Cascade: Recommenders are given strict priority, with the lower priority ones breaking ties in the scoring of the higher ones. •
Meta-level: One recommendation technique is applied and produces some sort of model, which is then the input used by the next technique. ==Technologies==