Since Stratio’s creation in 2014, we have posted a total of 86 posts on our blog. We would like to congratulate and thank all those Stratians who have written their posts and taught us about their specialities and discoveries in relation to Spark, Machine Learning, Deep Learning, Scala, business, Kafka… We know that is hard to find time to read all of the blog posts, so here you have a recap of the 3 most-read posts published on our blog!
In the previous post about Apache Ignite, we learnt how to set up and create either a simple cache or a sql cache, and share the cached data between different nodes. In this post, we will dig a little deeper. We will see what to do if our app crashes because the cached data has disappeared. How could Ignite help us avoid this problem?
Stratio has created a new user interface that allows you to work without writing a single line of code, which means that no programming skills are needed nor to use advanced technologies such as Spark, Scala, HDFS, or Elasticsearch. Developers, Architects, BI engineers, data scientists, business users and IT administrators can create data analytics applications in minutes with a powerful Spark Visual Editor. Welcome to Sparta 2.0, a brand-new version of Sparta born with the forthcoming release of the Stratio Data Centric Platform.
On March the 26th 2012, James Cameron and his submarine craft, Deepsea Challenger, explored the depths of the ocean down to 11km under sea level at 11.329903°N 142.199305°E, an infinitesimal point on the surface of the Earth’s vast Oceans. Could you imagine how incredible it would be to have thousands of “Deep Challengers” reaching the bottom of our planet in parallel? What a map we would get!
A follow-up to this post will be held at the Spark Summit East in Boston in February. Find out more.
Amongst all the Big Data technology madness, security seems to be an afterthought at best. When one talks about Big Data technologies and security, they are usually referring to the integration of these technologies with Kerberos. It’s true however that this trend seems to be changing for the better and we now have a few security options for these technologies, like TLS. Against this backdrop, we would like to take a look at the interaction between the most popular large-scale data processing technology, Apache Spark, and the most popular authentication framework, MIT’s Kerberos.
By Sondos Atwi @Sondos_4
On the 17th and 18th of November, I attended the Big Data Spain conference. It was my first time attending this type of events, and it was an excellent opportunity to meet experts in the fields and attend high-quality talks. So I decided to write this post to share a few of the presented slides and ideas.
Ps: Please excuse the quality of some slides/pictures, they were all taken by my phone camera!
First, Congrats to Big Data Spain on being the second biggest Big Data conference in Europe, right after O’Reilly Strata. This year’s edition also
had around 50% increase than last year’s!
Now let’s dig into the details…
Nowadays, there are a lot of Big Data query engines available. Some companies struggle to choose which one to use. Benchmarks exist, but results can be contradictory and thus difficult to trust.
One Big Data query engine that is frequently mentioned is Presto. We wanted to find out more about its potential and decided to compare it with Crossdata in a controlled environment, given that Crossdata is a data hub that extends the capabilities of Apache Spark. We detected that the most popular persisting layers in our projects are Apache Cassandra, MongoDB and HDFS+Parquet, but that MongoDB is not supported by Presto. The benchmark was therefore carried out with Apache Cassandra and HDFS+parquet only.
Crossdata provides additional features and optimizations to the SQLContext of Spark through the XDContext. It can be deployed as a library of Apache Spark or using a Client-Server architecture where the cluster of servers form a P2P structure.
Implicit parameters and conversions are powerful tools in Scala increasingly used to develop concise, versatile tools such as DSLs, APIs, libraries…
In this post we will show how to use the different SQL contexts for data query on Spark.
Proud to share the press release announcing Stratio as Huawei’s technological partner and looking forward to working together.
AMSTERDAM, Nov. 5, 2015 /PRNewswire/ — Huawei announced that Stratio has officially been certified as a Huawei Solution Partner (Technology) for Enterprise Data Centre Solutions.
Stratio, which pioneered the first Big Data platform using Apache Spark and integrating main NoSQL and SQL distributed databases, becomes Huawei’s first Big Data Technology Partner. The Stratio platform reduces complexity compared to other platforms, by giving customers control over all their Big Data software, and reduces Big Data time-to-value tenfold.