Paper of the week: “BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data” [1]

This paper has been presented at the Eurosys 2013 conference and is avaiblable for download at the conference website. The paper presents BlinkDB that, despite its name, is not a database but a query engine on top of Hive and Shark, and it is used for running interactive SQL queries on large volumes of data using data samples. BlinkDB is built using two key ideas: an adaptive optimization framework to build and maintain stratified samples, and a dynamic sample selection strategy to select appropiately sized sample based on a query’s accuracy or response time requirements.
This paper offers an interesting introduction on how to apply statistical inference technics on Big Data and makes clear that there is always a trade-off between accuracy and performance. In that regard, BlinkDB offers information about query accuracy so the user can make decisions. Although it is not clear what the cost of maintaining stratified samples is, the paper provides a good seed for future works in the area.
[1] Agarwal, Sameer, et al. “BlinkDB: queries with bounded errors and bounded response times on very large data.” Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013.


A formidable 2013 year of a Big Data start-up

The first 12 months of a BigData venture

We will remember 2013 as the year when Stratio was born. The brainchild of Oscar Méndez, Nacho Cabrera, Julio Casal and a few others, Stratio was only a number of disconnected ideas just 12 months ago. Some of the best ones were discussed, sometimes quite passionately, at a number of weekly web conferences between Madrid and Palo Alto in the early months of the year.

Spark Summit 2013 – Stratio was in San Francisco

December 2nd 2013 was a great day for the Spark community: the first Spark Summit took place in San Francisco. The event confirmed Spark as one of the BigData tools with the highest adoption rate seen in the last years. The event hosted 450 professionals interested in this technology and showed the roadmap for the upcoming months.