Continuous delivery in depth #1

Following the previous Lunch and learn, viewable at Stratio’s youtube channel about how Jenkins is being used for Stratio’s Continuous Delivery jobs, further info seemed to be logical, regarding our Jenkins pipeline plugin usage.

For this first issue, we will just follow how pipelines are being used at Stratio Big Data to achieve full lifecycle traceability, from a the development team to a final productive environment.

Some pitfalls were mentioned and will be explained, to fully comprehend the nature of the underlying bug and the solution achieved. This will be in issue #2.

The Developer’s Guide to Scala Implicit Values (Part I)

Implicit parameters and conversions are powerful tools in Scala increasingly used to develop concise, versatile tools such as DSLs, APIs, libraries…

When used correctly, they reduce the verbosity of Scala programs thus providing easy to read code. But their most important feature is that they offer a way to make your libraries functionality extendible without having to change their code nor needing to distribute it.

A great power comes with a great responsibility however. For new comers, as well as for relatively experienced Scala users, they can become a galaxy of confusions and pitfalls derived from the fact that the use of implicit values imply the compiler making decisions not obviously described in the code and following a set of rules with some unexpected results.

This post pretends to shed some light on the use of implicit values. Its content isn’t 100% original, it is just a tourist guide through this full of marvels, and sometimes dangerous, code jungle.
As most of those monstrous things that make us shiver, implicit values are mostly harmless once you get to know them.

Benchmarking Machine learning prediction models

When surfing the internet, it is quite easy to find sites comparing the most popular Machine learning toolkits (datascience.stackexchange.com, oreilly.com or udacity.com ). These sites give you a lot of information about the strengths and weaknesses of the libraries, how they work and some examples to compare how easy it is to use these types of tools. Therefore, if you are new to the business, they are very helpful for finding the right library to begin to study your data. Actually, they are written by Data Scientists for Data Scientists.

However, as a Software Engineer you would rather know if these tools are going to work well or just crash your servers. Based on this premise, the main objective of this article is to explore some Machine learning libraries and see how they work in a real time semi-production scenario.

Using Spark SQLContext, HiveContext & Spark Dataframes API with ElasticSearch, MongoDB & Cassandra

In this post we will show how to use the different SQL contexts for data query on Spark.
We will begin with Spark SQL and follow up with HiveContext. In addition to this, we will conduct queries on various NoSQL databases and analyze the advantages / disadvantages of using them, so without further ado, let’s get started!

First of all we need to create a context that will add Spark to the configuration options for connecting to Cassandra:

Spark SQLContext allows us to connect to different Data Sources to write or read data from them, but it has limitations, namely that when the program ends or the Spark shell is closed, all links to the datasoruces we have created are temporary and will not be available in the next session.

How to Aggregate Data in Real-Time with Stratio Sparta

When working with Big Data, it’s frequent to have the need to aggregate data in real-time, whether it comes from a specific service, such as social networks (Twitter, Facebook…) or even from more diverse sources, like a weather station. A good way to process these large amounts of information is with Spark Streaming, it provides us all the data in real time, but it has one problem: you have to program it yourself.

Monitoring the Spanish 2015 General Elections

We’re just a couple of days away from the Spanish general elections and Twitter is boiling up with campaign related messages. People want to have a say in what goes on in their country and they turn Twitter to express their opinions and feelings.

 

Social networks are starting to play a very important role in political events in Spain, that is why candidates from different parties are actively seeking to get the most profit from their presence in these type of platforms. They apply different strategies that allow them to connect with the people and, hopefully, gain their votes.

 

At Stratio we have been monitoring the campaign with our real-time data aggregation system, Stratio Sparkta, and with our visualization tool, Stratio Viewer. We use Apache Spark to to process the data and MongoDB to store it.

Huawei Appoints Stratio as Technology Partner

Proud to share the press release announcing Stratio as Huawei’s technological partner and looking forward to working together.

AMSTERDAM, Nov. 5, 2015 /PRNewswire/ — Huawei announced that Stratio has officially been certified as a Huawei Solution Partner (Technology) for Enterprise Data Centre Solutions.

Stratio, which pioneered the first Big Data platform using Apache Spark and integrating main NoSQL and SQL distributed databases, becomes Huawei’s first Big Data Technology Partner. The Stratio platform reduces complexity compared to other platforms, by giving customers control over all their Big Data software, and reduces Big Data time-to-value tenfold.

Variance in Scala (“Luke, he is your father too”)

When working with Big Data, sometimes it’s useful to remember that powerful products wouldn’t work properly without the tools that build them. It’s possible to start programming in Scala with a few case classes and a bunch of for-comprehensions, but those are only little scratches in a huge ice surface like Scala is. It may not be enough to make your code clean and comprehensible.  I’ve been developing with this programming language for almost 4 years, and every day I discover a new feature that surprises me. That acknowledgement, in the end, is the main reason to keep digging deeper into Scala.

Stratio’s Lucene-based index for Cassandra is now a plugin

Thanks to the changes proposed at CASSANDRA-8717CASSANDRA-7575 and CASSANDRA-6480, Stratio is glad to present its Lucene-based implementation of Cassandra secondary indexes as a plugin that can be attached to the Apache distribution. Before the above changes, Lucene index was distributed inside a fork of Apache Cassandra, with all the difficulties it implied, i.e. maintaining a fork. As of now, the fork is discontinued and new users should use the recently created plugin, which maintains all the features of Stratio Cassandra.

 

Stratio’s Lucene index extends Cassandra’s functionality to provide near real-time distributed search engine capabilities such as with ElasticSearch or Solr, including full text search capabilities, free multivariable search,