Tutorial: Migrating Data from MySQL to MongoDB

PREREQUISITES*:

  • MongoDB (versión 2.6 recommended)
  • MySQL 5
  • Java 7+
  • Maven 3+
  • Spark 1.2
  • Deep-Spark


LOAD TUTORIAL DATASET TO MySQL.

Create schema:

Create tables:

Populate tables:

Running Spark-shel:

USING SPARK STEP BY STEP.

Necessary imports:

Creating a configuration for the player RDD and initialize it:

Creating the RDDs that represent the data set in MySQL:

Map teams to pair with (team id, team):

Map players to pair with (team id, player) and group by team_id

Creating a configuration for the mongodb result RDD and initialize it:

Transforming the joined result to the desirable structure in mongodb:

Save the RDD in MongoDB

USING SPARK WITH OUR EXAMPLE PROJECT.

https://github.com/robertomorandeira/deep-example
Just make a git clone a run our java or scala example

Java example:
https://github.com/robertomorandeira/deep-example/blob/master/src/main/java/FootballMigrationApp.java
Scala example:
https://github.com/robertomorandeira/deep-example/blob/master/src/main/scala/FootballMigrationAppScala.scala
CHECKING DATA.

Connect to mongodb, normally

You can see the data loaded in mongoDB

INSTALLATION GUIDE FOR PREREQUISITES COMPONENTS:
MongoDB

MySQL Server

Java 7

Maven

Spark

Download spark:
http://ftp.cixug.es/apache/spark/spark-1.2.0/spark-1.2.0-bin-hadoop2.4.tgz
Deep-Spark

Download deep libraries:
http://search.maven.org/remotecontent?filepath=com/stratio/deep/deep-mongodb/0.6.3/deep-mongodb-0.7.0.jar
http://search.maven.org/remotecontent?filepath=com/stratio/deep/deep-core/0.6.3/deep-core-0.7.0.jar
http://search.maven.org/remotecontent?filepath=com/stratio/deep/deep-commons/0.6.3/deep-commons-0.7.0.jar
http://search.maven.org/remotecontent?filepath=com/stratio/deep/deep-jdbc/0.6.3/deep-jdbc-0.7.0.jar


Download MySQL library

http://search.maven.org/remotecontent?filepath=mysql/mysql-connector-java/5.1.34/mysql-connector-java-5.1.34.jar

2 Comments

  1. srinusays:

    Hi,

    I have a table which has to be moved to mongo and I want to include two of the fields which are region co ordinates as a single column under location. So can you please explain how it can be done.?

    • Stratio
      Stratiosays:

      Hi, you would have to do a map to transform the data in the table and add the two fields to a single attribute such as StructType in Spark SQL: “location”:{“coordinates”:[-73.856077,40.848447],”type”:”Point”}

      For using Spark with MongoDB, use the Data Source of Mongodb for Spark: https://github.com/Stratio/Spark-MongoDB

Leave a comment

Please be polite. We appreciate that. Your email address will not be published and required fields are marked