MongoDB-Spark connector Whitepaper

We recently worked with MongoDB and their developer team for the analysis of their Hadoop based connector Vs our native connector solution. The paper highlights how Stratio’s connector for Apache Spark implements the PrunedFilteredScan API instead of the TableScan API which effectively allows you to avoid scanning the entire collection.

Our connector supports the Spark Catalyst optimizer for both rule-based and cost-based query optimization. To operate against multi-structured data, the connector infers the schema by sampling documents from the MongoDB collection. This process is controlled by the samplingRatio parameter. If the schema is known, the developer can provide it to the connector, avoiding the need for any inference. Once data is stored in MongoDB, Stratio provides an ODBC/JDBC connector for integrating results with any BI tool.

The connector can be downloaded from the community Spark Packages repository. Installation is simple – the connector can be included in a Spark application with a single command. One of the main advantages of implementing the Dataframe API from Spark is that you can integrate different data sources, i.e you could make a join between a MongoDB table and an ElasticSearch collection.

Many thanks to Mat Keep and Sam Weaver from MongoDB, and our team of devs for making the analysis. Download the whitepaper here.

4 Comments

Jopina 6 years ago Reply

Such a very useful article. Very interesting to read this.I would like to thank you for the efforts you had made for writing this awesome article.
Maria 5 years ago Reply

Thanks for the thorough explanation.Good summary of a simple but often miss understood topic ? This blog is the fascinating one and it induces me to know more about it
Thanks for the sharing this blog and keep on sharing these kinds of useful blog.
Lovely 5 years ago Reply

Hiya, I’m really glad I have found this information. ?
Today bloggers publish only about gossips and net and this is really annoying. A good web site with interesting content, this is what I need. Thanks for keeping this web site, I’ll be visiting it.
Summer 5 years ago Reply

Thanks for the mention and your site looks great!! This is a great explanation you are so thorough….much appreciated!

MongoDB – Spark Connector Whitepaper

4 Comments

Write A Comment Cancel Reply

Product

Solutions

Use case

About us

Social

MongoDB – Spark Connector Whitepaper

Related Posts

Data Governance: Actionable vs Declarative – Stratio’s Approach

Spanish technology is at the highest level to enter the top five worldwide

Save our planet from home

4 Comments

Write A Comment Cancel Reply

Product

Solutions

Use case

About us

Social