How to Aggregate Data in Real-Time with Stratio Sparta

When working with Big Data, it’s frequent to have the need to aggregate data in real-time, whether it comes from a specific service, such as social networks (Twitter, Facebook…) or even from more diverse sources, like a weather station. A good way to process these large amounts of information is with Spark Streaming, it provides us all the data in real time, but it has one problem: you have to program it yourself.

Stratio Sparkta 0.5.0 release

It’s been almost two months since we introduced Stratio Sparkta at Strata London 2015, showing a demo for real-time insights on twitter hashtags (slides available here).

During this time we added some new features to the real-time aggregation engine based on Spark Streaming, but we have been dedicated especially to the stabilization of the project and laying the groundwork for an upcoming web tool.

In particular, we have been working hard to improve the syntax of the aggregation policy, which has been completely revised. Since you don’t need to code anything in Spark Streaming when using Stratio Sparkta (cool, right?), the declarative definition of aggregation policies is quite important to us.