Evolutionary feature selection in big datasets (Part I)

When we want to fit a Machine Learning (ML) model to a big dataset, it is often recommended to carefully pre-process the input data in order to obtain better results. Although it is widely accepted that more data lead to better results, this is not necessarily true when referred to the number of variables of our data. Some variables may be noisy, redundant and not useful. 

Data Lake: A more Technical
Point of View


Companies have come to realize of late that the real value of their business is data. There has been a rush to create huge Data Lakes to store the enormous amounts of data available inside each company. The concept of a Data Lake is that of a low cost, but highly scalable infrastructure in which all types of data can be stored.

This sounds good, but creating a Data Lake is not easy and a good design is a must.