Using Spark SQLContext, HiveContext & Spark Dataframes API with ElasticSearch, MongoDB & Cassandra

In this post we will show how to use the different SQL contexts for data query on Spark.
We will begin with Spark SQL and follow up with HiveContext. In addition to this, we will conduct queries on various NoSQL databases and analyze the advantages / disadvantages of using them, so without further ado, let’s get started!

First of all we need to create a context that will add Spark to the configuration options for connecting to Cassandra:

Spark SQLContext allows us to connect to different Data Sources to write or read data from them, but it has limitations, namely that when the program ends or the Spark shell is closed, all links to the datasoruces we have created are temporary and will not be available in the next session.

Stratio Release 1.2.0

  • Added HDFS as an option of persistence technologies.
  • Added HDFS Connector for Crossdata.
  • Retry button when a node fails to install.
  • UX refactor.
  • Now the Admin machine is the package repository for node installation (no 3rd party repositories needed).
  • Backend bugfixes.
  • Uninstall scripts for RedHat/CentOs.

Top-k queries in Cassandra: An embedded mapreduce approach

Stratio has just added top-k queries support to its Lucene based implementation of the Cassandra’s secondary indexes. This implementation was originally designed to allow embedded full-text and multivariable search in Apache Cassandra. The previous release included an ad-hoc mechanism to perform distributed relevance queries based on the Lucene’s scoring algorithm. The current release generalizes this mechanism to allow several types of top-k queries.

We were at the Cassandra Summit, from 10 to 12 September in San Francisco

The Cassandra Summit 2014 took place on September 10-12 in San Francisco with more than 2000 attendants. The summit was split into 3 days, with Training sessions the first and third day and Conference sessions with 6 tracks on the second one. The first thing we noticed after passing through the registration desk was how amazing the organization was. Smooth registration process, clear track and room directions, clear demarcation of themes: sponsors, tracks, etc. In summary, one of the best organizations in an appropriate venue with huge social network activity.