Data Lake: A more Technical
Point of View

Companies have come to realize of late that the real value of their business is data. There has been a rush to create huge Data Lakes to store the enormous amounts of data available inside each company. The concept of a Data Lake is that of a low cost, but highly scalable infrastructure in which all types of data can be stored.

This sounds good, but creating a Data Lake is not easy and a good design is a must.

Creating a Recommender System (Part II)

After the resounding success of the first article on recommender systems, Alvaro Santos is back with some further insight into creating a recommender system.


Coming soon: A follow-up Meetup in Madrid to go even further into this exciting topic. Stay tuned!


In the previous article of this series, we explained what a recommender system is, describing its main parts and providing some basic algorithms which are frequently used in these systems. We also explained how to code some functions to read JSON files and to map the data in MongoDB and ElasticSearch using Spark SQL and Spark connectors.

This second part will cover:

  • Generating our Collaborative Filtering model.
  • Pre-calculating product / user recommendations.
  • Launching a small REST server to interact with the recommender.
  • Querying the data store to retrieve content-based recommendations.
  • Mixing the different types of recommendations to create a hybrid recommender.

Creating a Recommender System (Part I)

This two-article series explains how to design and implement a hybrid recommender system that works just like the ones used by Amazon or Ebay.


Let’s start with a short definition from Wikipedia:

Recommender systems or recommendation systems (sometimes replacing “system” with a synonym such as platform or engine) are a subclass of information filtering system that seek to predict the ‘rating’ or ‘preference’ that a user would give to an item.

The following diagram is a basic illustration:

Recommender System diagram
Recommender System diagram

A recommender system analyses input data which contains information on different products and their user ratings. After reading and processing the data, the system  creates a model that can be used to predict ratings for a particular product or user.


In the recommender system world, there are three types of approaches to filter products: