A whooping 1K releases
using Jenkins!


We don’t usually like to boast but on this one we can’t hold back. As of 17 February 2017, a huge (but just symbolic) milestone was reached: more than 1000 automated releases performed by our Jenkins installation, from each continuous delivery pipeline.

Huawei Appoints Stratio as Technology Partner

Proud to share the press release announcing Stratio as Huawei’s technological partner and looking forward to working together.

AMSTERDAM, Nov. 5, 2015 /PRNewswire/ — Huawei announced that Stratio has officially been certified as a Huawei Solution Partner (Technology) for Enterprise Data Centre Solutions.

Stratio, which pioneered the first Big Data platform using Apache Spark and integrating main NoSQL and SQL distributed databases, becomes Huawei’s first Big Data Technology Partner. The Stratio platform reduces complexity compared to other platforms, by giving customers control over all their Big Data software, and reduces Big Data time-to-value tenfold.

Stratio’s Lucene-based index for Cassandra is now a plugin

Thanks to the changes proposed at CASSANDRA-8717CASSANDRA-7575 and CASSANDRA-6480, Stratio is glad to present its Lucene-based implementation of Cassandra secondary indexes as a plugin that can be attached to the Apache distribution. Before the above changes, Lucene index was distributed inside a fork of Apache Cassandra, with all the difficulties it implied, i.e. maintaining a fork. As of now, the fork is discontinued and new users should use the recently created plugin, which maintains all the features of Stratio Cassandra.

 

Stratio’s Lucene index extends Cassandra’s functionality to provide near real-time distributed search engine capabilities such as with ElasticSearch or Solr, including full text search capabilities, free multivariable search,

MongoDB – Spark Connector Whitepaper

We recently worked with MongoDB and their developer team for the analysis of their Hadoop based connector Vs our native connector solution. The paper highlights how Stratio’s connector for Apache Spark implements the PrunedFilteredScan API instead of the TableScan API which effectively allows you to avoid scanning the entire collection.

Our connector supports the Spark Catalyst optimizer for both rule-based and cost-based query optimization.

100 Stratians and counting

When we first started using Spark, we were twenty people. Twenty Stratians. We took a risk and adopted Spark very early on, but with a lot of teamwork and a lot of mistakes, we managed to create the first pure Spark platform.

We started getting more projects, and without realizing it 20 turned into 40, 40 into 60, and 60 into 100 Stratians. And we haven’t stopped growing ever since.

We have a winner!

Last March we begun the first Stratio Challenge. After many deliberations, the wait is over and we are proud to announce that the Stratio Challenge winners are Marco Piva, Leonardo Biagioli, Fabio Fantoni and Andrea De Marco, from BitBang.
We’ve had a higher turnout than expected, and even though we only had one prize, we want to congratulate, in alphabetical order, the answers given by: Alberto Vallejo, Israel Saeta, Javier Cruz, José Luis López Pino and Juan Antonio Cantarero.
We look forward to bringing new challenges and hope you have enjoyed this as much as we have 🙂

The answer to the challenge can be found here.

A Spark-based analytics solution for Online Advertisers

This post contains the winning solution for the Stratio challenge 2015 developed by Marco Piva, Leonardo Biagioli, Fabio Fantoni and Andrea De Marco ( BitBang ).

Abstract:
This work describes the data model and the architecture of a Big Data Analytics solution that can help online advertisers to get fast answers for their nalytical questions about impressions, clicks and purchases,basing on the Stratio Challenge requirements.
The emphasis is placed on the technologies that have been used, with particular focus on Spark and Spray.

Top-k queries in Cassandra: An embedded mapreduce approach

Stratio has just added top-k queries support to its Lucene based implementation of the Cassandra’s secondary indexes. This implementation was originally designed to allow embedded full-text and multivariable search in Apache Cassandra. The previous release included an ad-hoc mechanism to perform distributed relevance queries based on the Lucene’s scoring algorithm. The current release generalizes this mechanism to allow several types of top-k queries.

We were at the Cassandra Summit, from 10 to 12 September in San Francisco

The Cassandra Summit 2014 took place on September 10-12 in San Francisco with more than 2000 attendants. The summit was split into 3 days, with Training sessions the first and third day and Conference sessions with 6 tracks on the second one. The first thing we noticed after passing through the registration desk was how amazing the organization was. Smooth registration process, clear track and room directions, clear demarcation of themes: sponsors, tracks, etc. In summary, one of the best organizations in an appropriate venue with huge social network activity.