Recommender Systems

08 May 2024 / 21 minutes to read

Elena Daehnhardt

E-mail Twitter GitHub Pinterest LinkedIn Ko-fi

I am still working on this post, which is mostly complete. Thanks for your visit!

Table of Contents

Introduction

Recommendation systems are algorithms that suggest relevant items to users. Depending on the application, these items could be movies, songs, products, or anything else. Two of the most common approaches to building recommendation systems are collaborative filtering and content-based filtering.

This post covers the essentials of building recommendation systems, including some theory and practical Python implementation. Let’s go!

Recommendation task

When we create Recommender Systems (RS), we consider that we have a set of users and items which are recommended to these users. In practice, we have a prior history of user ratings. This history is used to create suggestions or recommendations.

Consider a movie recommender as a widely given example of a recommender system. For instance, users watch Netflix content and rate movies they watch. Netflix has knowledge of preferred movies and recommends movies not yet seen that will be possibly liked by users (ideally :)

Basic RS uses matrices to store user ratings, such as :

[Users\Movies]  | User 1 | User 2 | User 3 | ... | User N |
----------------|-----------------------------------------|
Movie 1         |   10   |    4   |   6    | ... |   9    |
----------------------------------------------------------|
Movie 2         |   ?    |    7   |   9    | ... |   7    |
----------------------------------------------------------|   
Movie 3         |   7    |    9   |   6    | ... |   ?    |
----------------------------------------------------------|
Movie 4         |   ?    |    ?   |   9    | ... |   7    |
----------------------------------------------------------|  

Notice that User 1 did not watch Movie 2 (we have a “?” question mark in the cell of the rating table), and User N did not watch Movie 3.

We have to predict the missing user ratings. This task is called rating prediction. We can recommend movies with the highest predicted ratings when we predict all the missing values or movie ratings.

Indeed, not all recommenders use this matrix format in practice. Data structures and algorithms must be optimised for effective resource management and reduced computation time.

Recommendations should be created quickly with scalability in mind. This is particularly important when dealing with big data in production settings. However, let’s keep it simple and consider the rating matrix structure.

Collaborative Filtering

One of the most used recommender approaches is collaborative filtering. Collaborative filtering recommends items based on other users’ preferences. It can be further divided into two sub-categories: user-based and item-based.

User-based Collaborative Filtering

This approach suggests items by finding similar users. It assumes that if users A and B rate items similarly, A is likely to have a similar opinion as B on an item that A hasn’t seen yet, see Collaborative Filtering at Wikipedia

In many information retrieval tasks, such as recommender systems, the items or products that we want to recommend or find similar items to, can be represented as vectors. This means that we can use mathematical operations to compare these vectors and measure their similarity.

One common technique is cosine similarity, which calculates the cosine of the angle between two vectors. When two vectors have a small angle between them, cosine similarity returns a value close to 1, indicating that they are very similar. On the other hand, when two vectors have a large angle between them, cosine similarity returns a value close to 0, indicating that they are very different.

Cosine similarity is widely used in data science to find similar documents, recommend products, and identify similar users. It helps build effective recommendation systems that suggest new items to users based on their past preferences or behaviors.

To give you more detail why cosine similarity is such as powerful method, I have asked Google’s Gemini the following:

Why is cosine similarity used for building recommender systems?

Recommender systems use cosine similarity because it's a way to quantify how similar items are to each other. Here's how it works:

* **Imagine user profiles or items as vectors in a high-dimensional space.** Each dimension represents a feature, like genre for movies or purchase history for users.
* **Cosine similarity then calculates the cosine of the angle between those two vectors.** A high cosine similarity score indicates the vectors are pointing in similar directions, meaning the items or users are similar.

This is useful for recommendations because:

* **It identifies items with similar characteristics.** If a user likes items with a specific feature (e.g., action movies), cosine similarity can recommend other items with that feature (e.g., other action movies).
* **It works well with sparse data.** Recommender systems often deal with data where users haven't interacted with most items. Cosine similarity focuses on the direction of the vector, not its magnitude, making it effective even with missing data points.

However, cosine similarity also has limitations:

* **It doesn't consider item or user ratings.** It only focuses on item features, not how much a user liked something similar.
* **It doesn't capture complex relationships.** Real-world preferences can be nuanced. Cosine similarity might miss these subtleties.

Overall, cosine similarity is a powerful tool for building recommender systems by finding similar items based on their features. It's a good starting point, but other techniques are often combined with it for more comprehensive recommendations.

Imagine you have two vectors, like arrows in space. Cosine similarity calculates the angle between those two arrows. A smaller angle indicates the vectors are more similar, and a larger angle means they’re less similar.

The cosine similarity calculation is available in sklearn:

from sklearn.metrics.pairwise import cosine_similarity

# Sample user-item rating matrix
ratings = np.array([
    [5, 3, 0, 1],
    [4, 0, 0, 1],
    [1, 1, 0, 5],
    [1, 0, 0, 4],
    [0, 1, 5, 4],
])

# Compute the cosine similarity between users
user_similarity = cosine_similarity(ratings)
print(user_similarity)

[[1.         0.86091606 0.42289003 0.36896403 0.18257419]
 [0.86091606 1.         0.42008403 0.47058824 0.14969624]
 [0.42289003 0.42008403 1.         0.98019606 0.62360956]
 [0.36896403 0.47058824 0.98019606 1.         0.59878495]
 [0.18257419 0.14969624 0.62360956 0.59878495 1.        ]]

The actual value of cosine similarity is the cosine of that angle. Cosine ranges from -1 to 1, with 1 meaning the vectors are identical (0 angle), 0 meaning they’re entirely different (90-degree angle), and -1 meaning they’re opposite (180-degree angle).

Please refer to the sklearn documentation in sklearn.metrics.pairwise.cosine_similarity.

Alternatively, we can use NumPy to calculate the cosine similarity between user1 and user2, using the “ratings” matrix above.

import numpy as np

def cosine_similarity_two_users(user1, user2):
    return np.dot(user1, user2) / (np.linalg.norm(user1) * np.linalg.norm(user2))

user1 = ratings[0]
user2 =  ratings[1]

cosine_similarity(user1, user2)

0.8609160647753271

In short, this approach recommends items by finding similar users. This is often measured by observing the items that similar users have liked.

Let’s implement a simple user-based collaborative filtering recommendation system:

import numpy as np

# Sample user-item matrix
ratings = np.array([[5, 3, 0, 1], [4, 0, 3, 1], [1, 1, 0, 5], [1, 0, 0, 4], [0, 1, 5, 4]])

# Compute the cosine similarity between users
similarity = np.dot(ratings, ratings.T)
norms = np.array([np.sqrt(np.diagonal(similarity))])
similarity = similarity / norms / norms.T

print(similarity)

[[1.         0.69614322 0.42289003 0.36896403 0.18257419]
 [0.69614322 1.         0.33968311 0.3805212  0.57496616]
 [0.42289003 0.33968311 1.         0.98019606 0.62360956]
 [0.36896403 0.3805212  0.98019606 1.         0.59878495]
 [0.18257419 0.57496616 0.62360956 0.59878495 1.        ]]

Item-based Collaborative Filtering

This method finds an item’s look-alike instead of a user’s. It measures the similarity between the items the target user rates or interacts with.

Notice that we can simply transpose the rating matrix and calculate the cosine similarity between items:

# Compute the cosine similarity between items
item_similarity = cosine_similarity(ratings.T)
print(item_similarity)

[[1.         0.73568078 0.31383947 0.35736521]
 [0.73568078 1.         0.25854384 0.4710412 ]
 [0.31383947 0.25854384 1.         0.51352592]
 [0.35736521 0.4710412  0.51352592 1.        ]]

Matrix Factorization

Matrix Factorization (MF) is a collaborative filtering method widely used in recommendation systems. It decomposes the user-item interaction matrix into two lower-dimensional matrices: user and item latent feature matrices. The idea is that these latent features capture the underlying factors associated with user preferences and item characteristics.

Singular Value Decomposition (SVD) is a matrix factorization technique that decomposes a matrix into three other matrices. It can help reduce dimensionality and extract latent factors related to users and items.

Scipy has svds implementation that we can use like this:

from scipy.sparse.linalg import svds

# Converting ratings to floats
ratings_matrix = np.array(ratings, dtype=float)

# Assuming you have a user-item ratings matrix
u, s, vt = svds(ratings_matrix, k=3)

SVD is a mathematical technique used in various applications, including signal processing, statistics, semantic analysis, and, most notably, building recommendation systems.

When you call svds(ratings_matrix, k=3), you’re asking it to decompose the ratings_matrix into three matrices U, S, and V^T, where k=3 specifies the number of singular values and vectors to compute. This is particularly useful when dealing with large matrices, as it allows for dimensionality reduction, retaining only the most significant features represented by the top k singular values. This is useful for dimensionality reduction, data compression, or noise reduction.

Remember, the choice of k (the number of singular values to compute) can significantly affect the results of your analysis or application. Choosing the correct value of k involves balancing between approximation accuracy and computational efficiency or simplicity of the model.

Here’s what U, S, and V^T represent:

U (left singular vectors): This is an m×k orthogonal matrix where m is the number of rows in the original matrix (e.g., users in a rating matrix). Each column can be seen as a “feature vector” for the rows.

print(u)

[[-0.17073573  0.74320914 -0.40899704]
 [ 0.48074594  0.3442087  -0.41930566]
 [-0.56001267 -0.22495939 -0.46080121]
 [-0.42477543 -0.18803495 -0.35858754]
 [ 0.49566551 -0.49314975 -0.56212224]]

S (singular values): This is a k×k diagonal matrix with non-negative real numbers on the diagonal. These values are known as singular values and are sorted in descending order. They give you an idea of each corresponding feature vector’s “importance” or “weight” in U and V^T.

print(s)

[4.53640842 5.81972146 9.41739755]

V^T (right singular vectors, transposed): This is a k×n orthogonal matrix where n is the number of columns in the original matrix (e.g., items in a rating matrix). It’s the transpose of V, where V contains columns that can be considered as “feature vectors” for the columns of the original matrix.

print(vt)

[[ 0.01863082 -0.12709489  0.86424435 -0.48639642]
 [ 0.80414264  0.25972348 -0.24625279 -0.47461342]
 [-0.48225601 -0.23891044 -0.43202256 -0.72367635]]

Here’s how you can use svds:

# The singular values 's' are returned as a 1D array for efficiency,
# Convert it to a diagonal matrix for further computations if necessary
s_diag_matrix = np.diag(s)
s_diag_matrix

array([[4.53640842, 0.        , 0.        ],
       [0.        , 5.81972146, 0.        ],
       [0.        , 0.        , 9.41739755]])

Now you can use u, s_diag_matrix, and vt for various applications, such as reconstructing the original matrix or performing dimensionality reduction.

In the context of recommendation systems, these decomposed matrices can predict missing entries in the original rating matrix. This is done by approximating the original matrix as the product of U, S, and V^T, which can highlight underlying patterns in the data, such as similarities between users or items.

# Reconstruct the approximate ratings matrix
ratings_approx = np.dot(np.dot(u, s_diag_matrix), vt)
ratings_approx

array([[ 5.32652372,  2.18816428,  0.18504005,  1.00294508],
       [ 3.46567149,  1.32850063, -0.30280243,  0.99518063],
       [ 1.19208569,  0.52241748,  0.10885441,  5.00173251],
       [ 0.86190988,  0.34333338, -0.07825527,  3.9987545 ],
       [-0.05108206,  1.1270053 ,  4.97105194,  3.99953927]])

The collaborative filtering recommendation approach helps capture user interests without needing the recommended item features. However, collaborative filtering has a cold-start problem when new, previously unseen items are introduced, as pointed out in Advantages and Disadvantages.

The cold-start problem is that a recommender system has difficulty making good recommendations for new users or items with little to no data.

Content-Based Filtering

Content-based filtering usually recommends items based on the features of the items (and/or a profile of the user’s preferences). It requires item features to be known in advance.

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

# Sample item descriptions
descriptions = [
    "An action-packed journey in space",
    "A deep dive into the mysteries of the cosmos",
    "A heartwarming drama about family and relationships",
    "A documentary about elephants living in the desert",
    "A comedy about a family vacation"
]

# Convert text to TF-IDF features
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(descriptions)

# Compute cosine similarity between items
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

print(cosine_sim)

[[1.         0.         0.         0.         0.        ]
 [0.         1.         0.         0.         0.        ]
 [0.         0.         1.         0.         0.20923103]
 [0.         0.         0.         1.         0.        ]
 [0.         0.         0.20923103 0.         1.        ]]

Notice how the third movie (“A heartwarming drama about family and relationships”) and the fifth movie ( “A comedy about a family vacation”) are rated as more similar to the rest of the movies?

Notice that content-based systems are scalable [5]. However, they might require more novelty while recommending movies too close to user liking. See Content-based filtering, Advantages and Disadvantages.

Summary of Recommender System Approaches

The recommendation approaches above have their advantages and limitations.

Create a Markdown table with Recommender System approaches, including content-based and collaborative filtering such as item-based and user-based in rows, and their advantages and disadvantages in columns.

The output of Google Gemini was:

Approach	Advantages	Disadvantages
Content-Based Filtering	* Easy to interpret * Works well for new users (cold start problem)	* Limited to similar items * Ignores user-specific preferences
User-Based Collaborative Filtering	* Captures user preferences effectively * Can recommend niche items	* Requires a large user base * Cold start problem for new users
Item-Based Collaborative Filtering	* Scales well with large item catalogs * Can discover hidden relationships between items	* Relies on user interaction history * Less effective for new items

Conclusion

Recommendation systems are complex yet fascinating tools that help personalise the user experience. Here, we have explored the basics of collaborative and content-based filtering and implemented them in Python.

In my next posts, I will discuss advanced recommendation systems using machine learning algorithms and deep learning applications. We will also learn about assessing the performance of recommender systems.

Should you like to explore the recommender systems, their methods, challenges and related research, you can also search on Google Scholar. For instance, I liked the recent recommender research overview by I Saifudin and T Widiyaningtyas published in IEEE Access (2024), see Systematic Literature Review on Recommender System: Approach, Problem, Evaluation Techniques, Datasets

Please subscribe so you do not miss the new content!

References

1. Scikit-learn Cosine Similarity Documentation

2. NumPy

3. scipy.sparse.linalg.svds

4. Collaborative filtering, Advantages and Disadvantages

5. Content-based filtering, Advantages and Disadvantages

6. Systematic Literature Review on Recommender System: Approach, Problem, Evaluation Techniques, Datasets

About Elena

Elena, a PhD in Computer Science, simplifies AI concepts and helps you use machine learning.

E-mail Twitter GitHub Pinterest LinkedIn Ko-fi

Thanks I have used these fantastic AI tools to create this post:
Gemini (previously Bard)
Grammarly
Midjourney

Citation

Elena Daehnhardt. (2024) 'Recommender Systems', daehnhardt.com, 08 May 2024. Available at: https://daehnhardt.com/blog/2024/05/08/recommender_system_approaches_with_python_code_collaborative_filtering_content_based/

« Previous: To cite or perish All Posts Next: Can AI hallucinate? »