In this post we will show how to use the different SQL contexts for data query on Spark.
We will begin with Spark SQL and follow up with HiveContext. In addition to this, we will conduct queries on various NoSQL databases and analyze the advantages / disadvantages of using them, so without further ado, let’s get started!
First of all we need to create a context that will add Spark to the configuration options for connecting to Cassandra:
val sparkConf = new SparkConf().setAppName("sparkSQLExamples").setMaster("local[*]")
val sparkContext = new SparkContext(sparkConf)
Spark SQLContext allows us to connect to different Data Sources to write or read data from them, but it has limitations, namely that when the program ends or the Spark shell is closed, all links to the datasoruces we have created are temporary and will not be available in the next session.