Driving Digital Transformation through Big Data

A Stratio Success Story

Stratio DataCentric came into existence because of a technological gap that exists in the world today” Nacho Navarro, Stratio

What is Stratio? This is a question that we can really only answer now, three years after our foundation by a team of seasoned engineers in 2013. Why has it taken us so long? Because we have been busy pulling together the most transformational and disruptive tool ever to exist in the short history of Big Data. We started with a vision and have made it a reality.

Javier Cortejoso, Gaspar Muñoz and Nacho Navarro reminiscence the journey towards the creation of Stratio’s powerful, state-of-the-art tool: Stratio DataCentric.

Benchmarking Machine learning prediction models

When surfing the internet, it is quite easy to find sites comparing the most popular Machine learning toolkits (datascience.stackexchange.com, oreilly.com or udacity.com ). These sites give you a lot of information about the strengths and weaknesses of the libraries, how they work and some examples to compare how easy it is to use these types of tools. Therefore, if you are new to the business, they are very helpful for finding the right library to begin to study your data. Actually, they are written by Data Scientists for Data Scientists.

However, as a Software Engineer you would rather know if these tools are going to work well or just crash your servers. Based on this premise, the main objective of this article is to explore some Machine learning libraries and see how they work in a real time semi-production scenario.

Monitoring the Spanish 2015 General Elections

We’re just a couple of days away from the Spanish general elections and Twitter is boiling up with campaign related messages. People want to have a say in what goes on in their country and they turn Twitter to express their opinions and feelings.

 

Social networks are starting to play a very important role in political events in Spain, that is why candidates from different parties are actively seeking to get the most profit from their presence in these type of platforms. They apply different strategies that allow them to connect with the people and, hopefully, gain their votes.

 

At Stratio we have been monitoring the campaign with our real-time data aggregation system, Stratio Sparkta, and with our visualization tool, Stratio Viewer. We use Apache Spark to to process the data and MongoDB to store it.

Huawei Appoints Stratio as Technology Partner

Proud to share the press release announcing Stratio as Huawei’s technological partner and looking forward to working together.

AMSTERDAM, Nov. 5, 2015 /PRNewswire/ — Huawei announced that Stratio has officially been certified as a Huawei Solution Partner (Technology) for Enterprise Data Centre Solutions.

Stratio, which pioneered the first Big Data platform using Apache Spark and integrating main NoSQL and SQL distributed databases, becomes Huawei’s first Big Data Technology Partner. The Stratio platform reduces complexity compared to other platforms, by giving customers control over all their Big Data software, and reduces Big Data time-to-value tenfold.

MongoDB – Spark Connector Whitepaper

We recently worked with MongoDB and their developer team for the analysis of their Hadoop based connector Vs our native connector solution. The paper highlights how Stratio’s connector for Apache Spark implements the PrunedFilteredScan API instead of the TableScan API which effectively allows you to avoid scanning the entire collection.

Our connector supports the Spark Catalyst optimizer for both rule-based and cost-based query optimization.

100 Stratians and counting

When we first started using Spark, we were twenty people. Twenty Stratians. We took a risk and adopted Spark very early on, but with a lot of teamwork and a lot of mistakes, we managed to create the first pure Spark platform.

We started getting more projects, and without realizing it 20 turned into 40, 40 into 60, and 60 into 100 Stratians. And we haven’t stopped growing ever since.

Stratio Sparkta 0.5.0 release

It’s been almost two months since we introduced Stratio Sparkta at Strata London 2015, showing a demo for real-time insights on twitter hashtags (slides available here).

During this time we added some new features to the real-time aggregation engine based on Spark Streaming, but we have been dedicated especially to the stabilization of the project and laying the groundwork for an upcoming web tool.

In particular, we have been working hard to improve the syntax of the aggregation policy, which has been completely revised. Since you don’t need to code anything in Spark Streaming when using Stratio Sparkta (cool, right?), the declarative definition of aggregation policies is quite important to us.

We have a winner!

Last March we begun the first Stratio Challenge. After many deliberations, the wait is over and we are proud to announce that the Stratio Challenge winners are Marco Piva, Leonardo Biagioli, Fabio Fantoni and Andrea De Marco, from BitBang.
We’ve had a higher turnout than expected, and even though we only had one prize, we want to congratulate, in alphabetical order, the answers given by: Alberto Vallejo, Israel Saeta, Javier Cruz, José Luis López Pino and Juan Antonio Cantarero.
We look forward to bringing new challenges and hope you have enjoyed this as much as we have 🙂

The answer to the challenge can be found here.