Creating a Recommender System (Part I)


This two-article series explains how to design and implement a hybrid recommender system that works just like the ones used by Amazon or Ebay.

Introduction

Let’s start with a short definition from Wikipedia:

Recommender systems or recommendation systems (sometimes replacing “system” with a synonym such as platform or engine) are a subclass of information filtering system that seek to predict the ‘rating’ or ‘preference’ that a user would give to an item.

The following diagram is a basic illustration:

Recommender System diagram
Recommender System diagram

A recommender system analyses input data which contains information on different products and their user ratings. After reading and processing the data, the system  creates a model that can be used to predict ratings for a particular product or user.

Approaches

In the recommender system world, there are three types of approaches to filter products:

The Developer’s Guide to Scala Implicit Values (Part I)

Implicit parameters and conversions are powerful tools in Scala increasingly used to develop concise, versatile tools such as DSLs, APIs, libraries…

When used correctly, they reduce the verbosity of Scala programs thus providing easy to read code. But their most important feature is that they offer a way to make your libraries functionality extendible without having to change their code nor needing to distribute it.

A great power comes with a great responsibility however. For new comers, as well as for relatively experienced Scala users, they can become a galaxy of confusions and pitfalls derived from the fact that the use of implicit values imply the compiler making decisions not obviously described in the code and following a set of rules with some unexpected results.

This post pretends to shed some light on the use of implicit values. Its content isn’t 100% original, it is just a tourist guide through this full of marvels, and sometimes dangerous, code jungle.
As most of those monstrous things that make us shiver, implicit values are mostly harmless once you get to know them.

Benchmarking Machine learning prediction models

When surfing the internet, it is quite easy to find sites comparing the most popular Machine learning toolkits (datascience.stackexchange.com, oreilly.com or udacity.com ). These sites give you a lot of information about the strengths and weaknesses of the libraries, how they work and some examples to compare how easy it is to use these types of tools. Therefore, if you are new to the business, they are very helpful for finding the right library to begin to study your data. Actually, they are written by Data Scientists for Data Scientists.

However, as a Software Engineer you would rather know if these tools are going to work well or just crash your servers. Based on this premise, the main objective of this article is to explore some Machine learning libraries and see how they work in a real time semi-production scenario.

Using Spark SQLContext, HiveContext & Spark Dataframes API with ElasticSearch, MongoDB & Cassandra

In this post we will show how to use the different SQL contexts for data query on Spark.
We will begin with Spark SQL and follow up with HiveContext. In addition to this, we will conduct queries on various NoSQL databases and analyze the advantages / disadvantages of using them, so without further ado, let’s get started!

First of all we need to create a context that will add Spark to the configuration options for connecting to Cassandra:

Spark SQLContext allows us to connect to different Data Sources to write or read data from them, but it has limitations, namely that when the program ends or the Spark shell is closed, all links to the datasoruces we have created are temporary and will not be available in the next session.

Huawei Appoints Stratio as Technology Partner

Proud to share the press release announcing Stratio as Huawei’s technological partner and looking forward to working together.

AMSTERDAM, Nov. 5, 2015 /PRNewswire/ — Huawei announced that Stratio has officially been certified as a Huawei Solution Partner (Technology) for Enterprise Data Centre Solutions.

Stratio, which pioneered the first Big Data platform using Apache Spark and integrating main NoSQL and SQL distributed databases, becomes Huawei’s first Big Data Technology Partner. The Stratio platform reduces complexity compared to other platforms, by giving customers control over all their Big Data software, and reduces Big Data time-to-value tenfold.

100 Stratians and counting

When we first started using Spark, we were twenty people. Twenty Stratians. We took a risk and adopted Spark very early on, but with a lot of teamwork and a lot of mistakes, we managed to create the first pure Spark platform.

We started getting more projects, and without realizing it 20 turned into 40, 40 into 60, and 60 into 100 Stratians. And we haven’t stopped growing ever since.

A Spark-based analytics solution for Online Advertisers

This post contains the winning solution for the Stratio challenge 2015 developed by Marco Piva, Leonardo Biagioli, Fabio Fantoni and Andrea De Marco ( BitBang ).

Abstract:
This work describes the data model and the architecture of a Big Data Analytics solution that can help online advertisers to get fast answers for their nalytical questions about impressions, clicks and purchases,basing on the Stratio Challenge requirements.
The emphasis is placed on the technologies that have been used, with particular focus on Spark and Spray.

Supporting service-based multi realm authentication and authorization

Security is often a forgotten concern in Big Data environments. However, as these technologies are being embraced by companies with sensitive data (think, for example, about banks or insurance companies), security is a growing requirement. In Stratio, we are aware of our clients’ needs, so we are studying the development of an integrated security solution for our platform.

We welcome Mnemo as our partner

MNEMO is a private Spanish company, founded in 2000 in Madrid, Spain. It has over 1,000 employees, operating in 11 countries — Spain, United States, Mexico, Colombia, Peru, Bolivia, Chile, Ecuador, Great Britain, Turkey and Saudi Arabia — with $94.5 million in turnover. Our strategic commitment has always been to ensure excellent quality service for our customers with the aim of achieving the highest loyalty levels with them, whom we understand to be our best referral. Currently MNEMO is focused on providing the market with high value-added solutions and services through our two strategic divisions: MNEMO Technology and MNEMO Security. MNEMO Technology is the MNEMO division in charge of the organisation’s Research and Development projects. Through the Centre for Technology Research and Development it develops software products that stand out for their high level of innovation and their adherence to the latest technological market trends. MNEMO Security is the MNEMO division that handles implementation of Security projects. Through the CERTs and SOCs from which it operates, MNEMO provides managed security services as well as services to detect and respond to security incidents. It has specialised laboratories in the field of digital forensics, malware and botnet analysis, as well as electronic devices.