What is Recommendation System?
It’s one of the most popular data since applications. It’s a system that predicts the likelihood that a user would prefer an item, based on his past behaviors. That can be done by employing a machine learning algorithm, which can predict user preferences for a particular entity. There are a wide variety of applications for the recommendation systems, and it is used by many of the big technology companies, in order to recommend products to their customers. For instance, Amazon used the recommendation systems for product recommendations, YouTube for video recommendations, Netflix and IMDB for movie recommendations, and Facebook and Twitter for friend recommendations.
- The diagram below demonstrates the recommender systems method.
Recommendation System Mechanism:
The engine of the recommendation system filters the data via different machine learning algorithms, and based on that filtering, it can predict the most relevant entities to be recommended. After studying the previous behaviors of the users, it recommends products/services that the user may be interested in.
The engine’s working of a recommendation is classified in these 3 steps:
1- Data Collection: The techniques that can be used to collect data are:
- Explicit, where data are provided intentionally as information (e.g. user’s input such as movies rating)
- Implicit, where data are provided intentionally but gathered from the available data stream (e.g. search history, clicks, order history, etc…)
2- Data Storage: It can be stored in a cloud storage such as SQL database, NoSQL database, or some other kind of object storage. However, it depends on the data type and amount as well. The more data that the storage can have for the model, the better the recommendation system can be.
3- Recommendation System Methods:
There are several methods in recommendation systems, but there are two major approaches to filter data on the system:
- Collaborative FilteringIt is making recommendations according to the combination of your experience and the experiences of other people.
- Content-Based Filtering (The one that I used in implementing my movie recommendation system)It is based on product attributes, which is the item description and the preferences of users’ profile. It calculates the similarity between different products on the basis of their attributes. It treats recommendation as a user-specific classification problem and learns a classifier for the user’s likes and dislikes based on product features.
- The diagram below demonstrates content-based filtering recommender systems.
Recommendation System Applications:
There is a wide and variety of applications for recommendation systems, especially in the data science field. For example, music and video companies like Netflix, YouTube, and Spotify use them to generate music and video recommendations. Amazon uses it for product recommendations. Social media platforms such as Facebook and Twitter use them for friends and content recommendations. Restaurants and hotels use it to generate food-related recommendations. As well as in the research articles, financial services, and life insurance.
Implementing Movie Recommendation System in Python
One simple and direct way to develop a movie recommender system is to use the correlation between the attributes of the movie. Thus, it will find the similarities between the movies to make a suitable recommendation for the user. I used here MovieLense data from Kaggle, and I employed a Machine Learning algorithm to filter data using the content-based filtering method, in the purpose of making those evaluations and predictions. I also used the K-nearest neighbor classifier model, which finds the k most similar items to a particular instance based on a given distance metric.
- The diagram below demonstrates the K-nearest neighbor classifier model.
After doing some Exploratory Data Analysis (EDA), I found out that there are only 6 features in the 2 datasets (merged). Thus, I decided to extract new features from the given ones as much as possible. Also, here are some noticed things from exploring the dataset,
About the dataset:
- Number of Movies in the Dataset: 10325 movies
- Number of Users in the Dataset: 668 users
- Most of the rated movies are having a rate of 4.0
- Only 1198 Movies have a rate of 0.5 (lowest rate)
- It shows the count of the top 10 genres that the movies in this dataset are categorized.
- The genre that represents the higher number of movies is Drama
After using the K-nearest neighbor classifier as a model to predict the model, its accuracy score was 48.5% and it had beat the baseline’s, by 48.2%.
I tried to implement the model by optimizing it with the GridSearchCV best parameters, but the accuracy did not increase.
Although that I extracted more than 20 features from the 6 ones, there was a shortage of information about the movies and their details! So, I believe that the accuracy score could be better if I had more details related to the movies. (e.g. actors & director)