Variance in Scala (“Luke, he is your father too”)

When working with Big Data, sometimes it’s useful to remember that powerful products wouldn’t work properly without the tools that build them. It’s possible to start programming in Scala with a few case classes and a bunch of for-comprehensions, but those are only little scratches in a huge ice surface like Scala is. It may not be enough to make your code clean and comprehensible.  I’ve been developing with this programming language for almost 4 years, and every day I discover a new feature that surprises me. That acknowledgement, in the end, is the main reason to keep digging deeper into Scala.

Stratio’s Lucene-based index for Cassandra is now a plugin

Thanks to the changes proposed at CASSANDRA-8717CASSANDRA-7575 and CASSANDRA-6480, Stratio is glad to present its Lucene-based implementation of Cassandra secondary indexes as a plugin that can be attached to the Apache distribution. Before the above changes, Lucene index was distributed inside a fork of Apache Cassandra, with all the difficulties it implied, i.e. maintaining a fork. As of now, the fork is discontinued and new users should use the recently created plugin, which maintains all the features of Stratio Cassandra.


Stratio’s Lucene index extends Cassandra’s functionality to provide near real-time distributed search engine capabilities such as with ElasticSearch or Solr, including full text search capabilities, free multivariable search,

MongoDB – Spark Connector Whitepaper

We recently worked with MongoDB and their developer team for the analysis of their Hadoop based connector Vs our native connector solution. The paper highlights how Stratio’s connector for Apache Spark implements the PrunedFilteredScan API instead of the TableScan API which effectively allows you to avoid scanning the entire collection.

Our connector supports the Spark Catalyst optimizer for both rule-based and cost-based query optimization.

100 Stratians and counting

When we first started using Spark, we were twenty people. Twenty Stratians. We took a risk and adopted Spark very early on, but with a lot of teamwork and a lot of mistakes, we managed to create the first pure Spark platform.

We started getting more projects, and without realizing it 20 turned into 40, 40 into 60, and 60 into 100 Stratians. And we haven’t stopped growing ever since.

Stratio Sparkta 0.5.0 release

It’s been almost two months since we introduced Stratio Sparkta at Strata London 2015, showing a demo for real-time insights on twitter hashtags (slides available here).

During this time we added some new features to the real-time aggregation engine based on Spark Streaming, but we have been dedicated especially to the stabilization of the project and laying the groundwork for an upcoming web tool.

In particular, we have been working hard to improve the syntax of the aggregation policy, which has been completely revised. Since you don’t need to code anything in Spark Streaming when using Stratio Sparkta (cool, right?), the declarative definition of aggregation policies is quite important to us.

We have a winner!

Last March we begun the first Stratio Challenge. After many deliberations, the wait is over and we are proud to announce that the Stratio Challenge winners are Marco Piva, Leonardo Biagioli, Fabio Fantoni and Andrea De Marco, from BitBang.
We’ve had a higher turnout than expected, and even though we only had one prize, we want to congratulate, in alphabetical order, the answers given by: Alberto Vallejo, Israel Saeta, Javier Cruz, José Luis López Pino and Juan Antonio Cantarero.
We look forward to bringing new challenges and hope you have enjoyed this as much as we have :)

The answer to the challenge can be found here.

A Spark-based analytics solution for Online Advertisers

This post contains the winning solution for the Stratio challenge 2015 developed by Marco Piva, Leonardo Biagioli, Fabio Fantoni and Andrea De Marco ( BitBang ).

This work describes the data model and the architecture of a Big Data Analytics solution that can help online advertisers to get fast answers for their nalytical questions about impressions, clicks and purchases,basing on the Stratio Challenge requirements.
The emphasis is placed on the technologies that have been used, with particular focus on Spark and Spray.