Creating a Recommender System (Part II)

After the resounding success of the first article on recommender systems, Alvaro Santos is back with some further insight into creating a recommender system.

 

Coming soon: A follow-up Meetup in Madrid to go even further into this exciting topic. Stay tuned!

***

In the previous article of this series, we explained what a recommender system is, describing its main parts and providing some basic algorithms which are frequently used in these systems. We also explained how to code some functions to read JSON files and to map the data in MongoDB and ElasticSearch using Spark SQL and Spark connectors.

This second part will cover:

  • Generating our Collaborative Filtering model.
  • Pre-calculating product / user recommendations.
  • Launching a small REST server to interact with the recommender.
  • Querying the data store to retrieve content-based recommendations.
  • Mixing the different types of recommendations to create a hybrid recommender.

Collaborative Filtering Algorithm

In this recommender service example, we have chosen ALS or Alternating Least Squares as the algorithm for Collaborative Filtering. Although ALS is the only algorithm implemented by Spark for this matter, it has been broadly tested and shown to have a good performance. It is perfectly suited for this project example.

You can learn more about Alternating Least Squares at this link.

Recommender Trainer

In this section, we will code the program that will create our Collaborative Filtering model. It will pre-calculate all recommendations to ensure a faster service.

First of all, we should read all the reviews from MongoDB:

The data cannot however be used by the Spark API “as it comes” from the DB. We must transform our ratings’ Dataframe into a RDD of Spark ratings:

Now it is time to create our ALS model:

Once we have trained the model, the next step is to pre-calculate the recommendations. We should however, firstly create two lists with the products and users:

Then we need to calculate the user recommendations using the Spark API and save the data to MongoDB:

Finally, we have to pre-calculate the product recommendations. Spark does not provide a direct way of calculating the recommendations for products. We will therefore measure the similarity of the products using the cosine similarity:

Recommender Service

After saving all the products/previews and the pre-calculated Collaborative Filtering recommendations in the DBs, it is time to create a simple REST services that will retrieve the final recommendations. For that purpose we have selected the framework Spray.io, which is simple, elegant and pure Scala.

Creating URL mappings for our recommender service is very simple:

Now it is time to code our recommendation services.  We should start with the Collaborative Filtering recommendations. In this case it is simple because they have been pre-calculated. So we can just read them from MongoDB:

Then we should code the content-based recommendations. Although we have not pre-calculated these types of recommendations, it is quite simple to obtain them using ElasticSearch. To do this, we need to ask the server which products match certain criteria more:

For hybrid recommendations, the theory is simple: use different types of recommendations and combine their output using weights.

Conclusions

In the second part of the series, we have learnt how to:

1.          Create Collaborative Filtering recommendations using Spark.

2.          Obtain content-based recommendations using ElasticSearch.

3.          Combine several types of recommendations to create a hybrid recommender.

If you are interested in finding out more, the code is freely available in my Github repository.

1 Comments

  1. I like the post, thank you Álvaro.

Leave a comment

Please be polite. We appreciate that. Your email address will not be published and required fields are marked