Evolutionary feature selection in big datasets (Part I)

When we want to fit a Machine Learning (ML) model to a big dataset, it is often recommended to carefully pre-process the input data in order to obtain better results. Although it is widely accepted that more data lead to better results, this is not necessarily true when referred to the number of variables of our data. Some variables may be noisy, redundant and not useful. 

Profiling and segmentation: A graph database clustering solution

This post is about an exciting journey that starts with a problem and ends with a solution. One of the top banks in Europe came to us with a request: they needed a better profiling system.

We came up with a methodology for clustering nodes in a graph database according to concrete parameters.

We started by developing a Proof of Concept (POC) to test an approximation of the bank’s profiling data, using the following technologies:

  • Java / Scala, as Programming languages.
  • Apache Spark, to handle the given Data Sets.
  • Neo4j, as graph database.

The POC began as a 2-month long project in which we rapidly discovered a powerful solution to the bank’s needs.

We decided to use Neo4j, along with Cypher, Neo4j’s query language, because relationships are a core aspect of their data model. Their graph databases can manage highly connected data and complex queries. We were then able to make node clusters thanks to GraphX, an Apache Spark API for running graph and graph-parallel compute operations on data.

Ideas from Big Data Spain 2016

By Sondos Atwi @Sondos_4

On the 17th and 18th of November, I attended the Big Data Spain conference. It was my first time attending this type of events, and it was an excellent opportunity to meet experts in the fields and attend high-quality talks. So I decided to write this post to share a few of the presented slides and ideas.

Ps: Please excuse the quality of some slides/pictures, they were all taken by my phone camera 🙂

First, Congrats to Big Data Spain on being the second biggest Big Data conference in Europe, right after O’Reilly Strata. This year’s edition alsoBig Data Europe Conferences
had around 50% increase than last year’s!


Now let’s dig into the details…