MongoDB – Spark Connector Whitepaper

We recently worked with MongoDB and their developer team for the analysis of their Hadoop based connector Vs our native connector solution. The paper highlights how Stratio’s connector for Apache Spark implements the PrunedFilteredScan API instead of the TableScan API which effectively allows you to avoid scanning the entire collection.

Our connector supports the Spark Catalyst optimizer for both rule-based and cost-based query optimization.

100 Stratians and counting

When we first started using Spark, we were twenty people. Twenty Stratians. We took a risk and adopted Spark very early on, but with a lot of teamwork and a lot of mistakes, we managed to create the first pure Spark platform.

We started getting more projects, and without realizing it 20 turned into 40, 40 into 60, and 60 into 100 Stratians. And we haven’t stopped growing ever since.

Stratio Sparkta 0.5.0 release

It’s been almost two months since we introduced Stratio Sparkta at Strata London 2015, showing a demo for real-time insights on twitter hashtags (slides available here).

During this time we added some new features to the real-time aggregation engine based on Spark Streaming, but we have been dedicated especially to the stabilization of the project and laying the groundwork for an upcoming web tool.

In particular, we have been working hard to improve the syntax of the aggregation policy, which has been completely revised. Since you don’t need to code anything in Spark Streaming when using Stratio Sparkta (cool, right?), the declarative definition of aggregation policies is quite important to us.

We have a winner!

Last March we begun the first Stratio Challenge. After many deliberations, the wait is over and we are proud to announce that the Stratio Challenge winners are Marco Piva, Leonardo Biagioli, Fabio Fantoni and Andrea De Marco, from BitBang.
We’ve had a higher turnout than expected, and even though we only had one prize, we want to congratulate, in alphabetical order, the answers given by: Alberto Vallejo, Israel Saeta, Javier Cruz, José Luis López Pino and Juan Antonio Cantarero.
We look forward to bringing new challenges and hope you have enjoyed this as much as we have 🙂

The answer to the challenge can be found here.

A Spark-based analytics solution for Online Advertisers

This post contains the winning solution for the Stratio challenge 2015 developed by Marco Piva, Leonardo Biagioli, Fabio Fantoni and Andrea De Marco ( BitBang ).

Abstract:
This work describes the data model and the architecture of a Big Data Analytics solution that can help online advertisers to get fast answers for their nalytical questions about impressions, clicks and purchases,basing on the Stratio Challenge requirements.
The emphasis is placed on the technologies that have been used, with particular focus on Spark and Spray.

Supporting service-based multi realm authentication and authorization

Security is often a forgotten concern in Big Data environments. However, as these technologies are being embraced by companies with sensitive data (think, for example, about banks or insurance companies), security is a growing requirement. In Stratio, we are aware of our clients’ needs, so we are studying the development of an integrated security solution for our platform.

Stratio Release 1.2.0

  • Added HDFS as an option of persistence technologies.
  • Added HDFS Connector for Crossdata.
  • Retry button when a node fails to install.
  • UX refactor.
  • Now the Admin machine is the package repository for node installation (no 3rd party repositories needed).
  • Backend bugfixes.
  • Uninstall scripts for RedHat/CentOs.

We welcome Mnemo as our partner

MNEMO is a private Spanish company, founded in 2000 in Madrid, Spain. It has over 1,000 employees, operating in 11 countries — Spain, United States, Mexico, Colombia, Peru, Bolivia, Chile, Ecuador, Great Britain, Turkey and Saudi Arabia — with $94.5 million in turnover. Our strategic commitment has always been to ensure excellent quality service for our customers with the aim of achieving the highest loyalty levels with them, whom we understand to be our best referral. Currently MNEMO is focused on providing the market with high value-added solutions and services through our two strategic divisions: MNEMO Technology and MNEMO Security. MNEMO Technology is the MNEMO division in charge of the organisation’s Research and Development projects. Through the Centre for Technology Research and Development it develops software products that stand out for their high level of innovation and their adherence to the latest technological market trends. MNEMO Security is the MNEMO division that handles implementation of Security projects. Through the CERTs and SOCs from which it operates, MNEMO provides managed security services as well as services to detect and respond to security incidents. It has specialised laboratories in the field of digital forensics, malware and botnet analysis, as well as electronic devices.