Employee turnover: the good, the bad, and the ugly


It is a common truism in Human Resources that labor turnover is generally bad news for a given company and that management must take precautionary measures to reduce it, or at least to keep it under control.

When there is market demand for the services performed by an employee that leaves a company, the latter finds itself with unforeseen expenses to qualify, source, hire, and “onboard” a suitable replacement.

Aside from being out-of-pocket, which is directly reflected in the balance sheet, the wound is usually deeper than that. There are hidden costs, linked to the loss of business in the area that the former employee was presumably contributing to, during both the period in which the position is open and the time it takes to accommodate the new employee and retrain him or her to reach the peak level of productivity of their predecessor. These hidden costs may also be traced back to the loss of knowledge that may leave a gap and affect other employees in the department and hinder cooperation and so forth.  These indirect costs are hard to properly quantify, but typically dwarf the direct accounting costs that are immediately felt in the cash flow statements. Estimates of the impact taken from literature range from 25% to 200% of the departing employee’s annual compensation, with the figure probably being industry or even business dependent.

Spark and Kerberos: a safe story

A follow-up to this post will be held at the Spark Summit East in Boston in February. Find out more.

***

Amongst all the Big Data technology madness, security seems to be an afterthought at best. When one talks about Big Data technologies and security, they are usually referring to the integration of these technologies with Kerberos. It’s true however that this trend seems to be changing for the better and we now have a few security options for these technologies, like TLS. Against this backdrop, we would like to take a look at the interaction between the most popular large-scale data processing technology, Apache Spark, and the most popular authentication framework, MIT’s Kerberos.

Creating a Recommender System (Part II)

After the resounding success of the first article on recommender systems, Alvaro Santos is back with some further insight into creating a recommender system.

 

Coming soon: A follow-up Meetup in Madrid to go even further into this exciting topic. Stay tuned!

***

In the previous article of this series, we explained what a recommender system is, describing its main parts and providing some basic algorithms which are frequently used in these systems. We also explained how to code some functions to read JSON files and to map the data in MongoDB and ElasticSearch using Spark SQL and Spark connectors.

This second part will cover:

  • Generating our Collaborative Filtering model.
  • Pre-calculating product / user recommendations.
  • Launching a small REST server to interact with the recommender.
  • Querying the data store to retrieve content-based recommendations.
  • Mixing the different types of recommendations to create a hybrid recommender.

Ideas from Big Data Spain 2016

By Sondos Atwi @Sondos_4

On the 17th and 18th of November, I attended the Big Data Spain conference. It was my first time attending this type of events, and it was an excellent opportunity to meet experts in the fields and attend high-quality talks. So I decided to write this post to share a few of the presented slides and ideas.

Ps: Please excuse the quality of some slides/pictures, they were all taken by my phone camera 🙂

First, Congrats to Big Data Spain on being the second biggest Big Data conference in Europe, right after O’Reilly Strata. This year’s edition alsoBig Data Europe Conferences
had around 50% increase than last year’s!

 

Now let’s dig into the details…

Continuous Delivery in depth #2

The not so lean side

Remember issue #1 published in the summer? We are back with the next part in the series, wearing the hat of Pitfall Harry to look at  some of the issues we have come across and how these have impacted our day-to-day job. We also include some tips for overcoming them.

First things first: Jenkins’ pipelines are an awesome improvement over basic Jenkins funcionalities, allowing us to easily build complex continuous delivery flows, with extreme reusability and maintainability attributes.

Having said this, pipelines are code. And code is written by human beings. Human beings make mistakes. Such errors are reflected as software defects and execution failures.

This post will take a look at some of the defects, pitfalls and limitations of our (amazing) Jenkins’ pipelines, defining some possible workarounds.

Driving Digital Transformation through Big Data

A Stratio Success Story

Stratio DataCentric came into existence because of a technological gap that exists in the world today” Nacho Navarro, Stratio

What is Stratio? This is a question that we can really only answer now, three years after our foundation by a team of seasoned engineers in 2013. Why has it taken us so long? Because we have been busy pulling together the most transformational and disruptive tool ever to exist in the short history of Big Data. We started with a vision and have made it a reality.

Javier Cortejoso, Gaspar Muñoz and Nacho Navarro reminiscence the journey towards the creation of Stratio’s powerful, state-of-the-art tool: Stratio DataCentric.

Stratio Crossdata vs Presto

Introduction

Nowadays, there are a lot of Big Data query engines available. Some companies struggle to choose which one to use. Benchmarks exist, but results can be contradictory and thus difficult to trust.

One Big Data query engine that is frequently mentioned is Presto. We wanted to find out more about its potential and decided to compare it with Crossdata in a controlled environment, given that Crossdata is a data hub that extends the capabilities of Apache Spark. We detected that the most popular persisting layers in our projects are Apache Cassandra, MongoDB and HDFS+Parquet, but that MongoDB is not supported by Presto. The benchmark was therefore carried out with Apache Cassandra and HDFS+parquet only.

Crossdata provides additional features and optimizations to the SQLContext of Spark through the XDContext. It can be deployed as a library of Apache Spark or using a Client-Server architecture where the cluster of servers form a P2P structure.

Creating a Recommender System (Part I)


This two-article series explains how to design and implement a hybrid recommender system that works just like the ones used by Amazon or Ebay.

Introduction

Let’s start with a short definition from Wikipedia:

Recommender systems or recommendation systems (sometimes replacing “system” with a synonym such as platform or engine) are a subclass of information filtering system that seek to predict the ‘rating’ or ‘preference’ that a user would give to an item.

The following diagram is a basic illustration:

Recommender System diagram
Recommender System diagram

A recommender system analyses input data which contains information on different products and their user ratings. After reading and processing the data, the system  creates a model that can be used to predict ratings for a particular product or user.

Approaches

In the recommender system world, there are three types of approaches to filter products:

The Developer’s Guide to Scala Implicit Values (Part II)

Imagine a rectangular grid of cells, in which each cell has a value – Either black (dead) or white (alive). And imagine that:

  1. Any live cell with two or three live neighbors survives for the next generation.
  2. Any cell with four or more neighbors dies from overpopulation.
  3. Any cell with one or no neighbors dies from isolation.
  4. Any dead cell with exactly three neighbors comes to life.

 

These are the four simple rules of Conway’s Game of Life . You could hardly imagine a simpler set of rules to code on your computer and you wouldn’t expect any interesting result at all, but…

Behold the wonders of its hidden might!

Stratio @ #MesosCon Europe

MesosCon Europe was held in Amsterdam from August 31 to September 2 and a small representation of Stratio’s crew was over there.

Benjamin Hindman’s opening keynote

Mesosphere’s Co-Founder & Chief Architect Benjamin Hindman broke the ice with the first keynote.

Stratio team at MesosCon Europe
Alberto Rodriguez and Andrés Macarrilla at MesosCon Europe

After talking about the mesos ecosystem’s growth within the last months, he explained the nested containerization model and the improvements in the Mesos resource allocation. He then introduced an African nonprofit organization called praekelt.org dedicated to using mobile technology to improve the lives of people living in poverty. A representative from the NGO explained how mesos and DC/OS are a perfect fit for its cluster provisioning. The NGO has to run quite a few clusters where 80% of the setup is identical for each cluster and the remaining 20% is different. They therefore get the most out of mesos and DC/OS by deploying these distinctive features separately. The representative was asked what the Mesos’ biggest deficiency is nowadays. He replied that they were struggling to find a persistence layer that fits their current needs (he pointed out that they are currently using GlusterFS as their persistence backend).