Tutorial: Migrating Data from MySQL to MongoDB

In the next tutorial you will learn how to migrate data from MySQL to MongoDB. We will show you how to do it using Spark step by step. From creating a configuration for the player RDD to the installation guide for prerequisites components.
Easy and intuitive!

PREREQUISITES*:

  • MongoDB (versión 2.6 recommended)
  • MySQL 5
  • Java 7+
  • Maven 3+
  • Spark 1.2
  • Deep-Spark


TUTORIAL TO LOAD A DATASET TO MySQL:

Create schema:

create schema football

Create tables:

create table if not exists football.team(
    id bigint not null auto_increment,
    name varchar(255) not null,
    short_name varchar(255) not null,
    arena_name varchar(255) not null,
    coach_name varchar(255) not null,
    city_name varchar(255) not null,
    league_name varchar(255) not null,
    primary key (id)
);
create table if not exists football.player(
    id bigint not null auto_increment,
    firstname varchar(255) not null,
    lastname varchar(255) not null,
    date_of_birth date,
    place_of_birth_name varchar(255) not null,
    position_name varchar(255) not null,
    team_id bigint not null,
    primary key (id),
    foreign key (team_id) references football.team(id)
);

Populate tables:

insert into football.team (name, short_name, arena_name, coach_name, city_name, league_name) 
values
('FC Bayern München', 'FCB', 'Allianz Arena', 'Josep Guardiola', 'München', 'Bundesliga'),
('Hamburger SV', 'HSV', 'Imtech Arena', 'Josef Zinnbauer', 'Hamburg', 'Bundesliga'),
('Herta BSC Berlin', 'Herta', 'Olympiastaion Berlin', 'Jos Luhukay', 'Berlin', 'Bundesliga'),
('FC Basel 1893', 'FCB', 'St. Jakob-Park', 'Paulo Sousa', 'Basel', 'Raiffeisen Super League'),
('FC Paris Saint-Germain', 'PSG', 'Parc des Princes', 'Laurent Blanc', 'Paris', 'Ligue 1'),
('HJK Helsinki', 'HJK', 'Sonera Stadium', 'Mika Lehkosuo', 'Helsinki', 'Veikkausliiga');
insert into football.player(firstname, lastname, date_of_birth, place_of_birth_name, position_name, team_id) 
values
('Manuel', 'Neuer', '1986-3-27', 'Gelsenkirchen', 'Goalkeeper', 1),
('Julian', 'Schieber', '1989-2-13', 'Backnang', 'Centre Forward', 3),
('Dennis ', 'Diekmeier', '1989-10-20', 'Thedinghausen', 'Right Wing', 2),
('Zlatan', 'Ibrahimovic', '1981-10-03', 'Malmö', 'Centre Forward', 5),
('Xabier', 'Alonso', '1981-11-25', 'Tolosa', 'Midfielder', 1);

Running Spark-shel:

 ./spark-shell --jars YOUR_PATH/deep-core-0.7.0.jar, 
                      YOUR_PATH/deep-commons-0.7.0.jar,YOUR_PATH/deep-jdbc-0.7.0.jar, 
                      YOUR_PATH/deep-mongodb-0.7.0.jar,YOUR_PATH/mongo-java-driver-2.12.4.jar,YOUR_PATH/mysql-connector-java-5.1.34.jar

USING SPARK STEP BY STEP:

Necessary imports:

import com.stratio.deep.commons.entity.{Cell, Cells}
import com.stratio.deep.core.context.DeepSparkContext
import com.stratio.deep.jdbc.config.{JdbcConfigFactory, JdbcDeepJobConfig}
import com.stratio.deep.mongodb.config.{MongoConfigFactory, MongoDeepJobConfig}
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
import scala.collection.JavaConverters._

private val CLUSTER: String = "local"
private val APP_NAME: String = "deepMySQLToMongodbMigration"
private val MYSQL_DRIVER: String = "com.mysql.jdbc.Driver"
private val MYSQL_HOST: String = "localhost"
private val MYSQL_PORT: Integer = 3306
private val MYSQL_USER: String = "root"
private val MYSQL_PASS: String = "root"
private val MYSQL_DBNAME: String = "football"
private val MONGODB_HOST: String = "localhost:27017"
private val MONGODB_DBNAME: String = "football"
private val MONGODB_COLLECTION: String = "teams"eating a configuration for the team RDD and initialize it
val mySQLTeamConfig: JdbcDeepJobConfig[Cells] = JdbcConfigFactory        
    .createJdbc
    .host(MYSQL_HOST).port(MYSQL_PORT)
    .username(MYSQL_USER).password(MYSQL_PASS)
    .database(MYSQL_DBNAME).table("team")
    .driverClass(MYSQL_DRIVER)
    .initialize

Creating a configuration for the player RDD and initialize it:

val mySQLPlayerConfig: JdbcDeepJobConfig[Cells] = JdbcConfigFactory      
    .createJdbc
    .host(MYSQL_HOST).port(MYSQL_PORT)
    .username(MYSQL_USER).password(MYSQL_PASS)
    .database(MYSQL_DBNAME).table("player")
    .driverClass(MYSQL_DRIVER)
    .initialize

Creating the RDDs that represent the data set in MySQL:

val teamRDD: RDD[Cells] = deepContext.createRDD(mySQLTeamConfig)
val playerRDD: RDD[Cells] = deepContext.createRDD(mySQLPlayerConfig)

Map teams to pair with (team id, team):

val teamPairRDD: RDD[(Long, Cells)] = teamRDD.map(team => (team.getLong("id").longValue(), team))

Map players to pair with (team id, player) and group by team_id

val playerPairRDD: RDD[(Long, Iterable[Cells])] = playerRDD.map(cells =>(cells.getLong("team_id").longValue(), cells)).groupByKey
val joinedPairRDD: RDD[(Long, (Cells, Option[Iterable[Cells]]))] = teamPairRDD.leftOuterJoin(playerPairRDD)

Creating a configuration for the mongodb result RDD and initialize it:

val outputMongodbConfig: MongoDeepJobConfig[Cells] = MongoConfigFactory.createMongoDB
.host(MONGODB_HOST)
.database(MONGODB_DBNAME).collection(MONGODB_COLLECTION)
.initialize

Transforming the joined result to the desirable structure in mongodb:

// Ej: {_id: <team_id>, name: <team_name>, players: [<player_name_1>, <player_name_2>]}
 val outputRDD: RDD[Cells] = joinedPairRDD.map(joined => {
    val cells: Cells = new Cells
    cells.add(Cell.create("_id", joined._1))
    cells.add(Cell.create("name", joined._2._1.getString("name")))
    cells.add(Cell.create("players", joined._2._2 match {
        case Some(players) => players.map(player => player.getString("lastname") + ", " + player.getString("firstname")).asJava
        case _ => null
    }))
    cells
})

Save the RDD in MongoDB

DeepSparkContext.saveRDD(outputRDD, outputMongodbConfig)


USING SPARK WITH OUR EXAMPLE PROJECT:

https://github.com/robertomorandeira/deep-example 
Just make a git clone a run our java or scala example

git clone https://github.com/robertomorandeira/deep-example.git

Java example:
https://github.com/robertomorandeira/deep-example/blob/master/src/main/java/FootballMigrationApp.java 
Scala example:
https://github.com/robertomorandeira/deep-example/blob/master/src/main/scala/FootballMigrationAppScala.scala 


CHECKING DATA.

Connect to mongodb, normally

$ mongo --host 127.0.0.1 --port 27017 football
db.output.find(“teams”).pretty()

You can see the data loaded in mongoDB


INSTALLATION GUIDE FOR PREREQUISITES COMPONENTS:

MongoDB

sudo apt-get update
sudo apt-get install mongodb-org

MySQL Server

sudo apt-get install mysql-server

Java 7

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer

Maven

sudo apt-get update
sudo apt-get install maven

2 comments on “Tutorial: Migrating Data from MySQL to MongoDB”

  1. Hi,

    I have a table which has to be moved to mongo and I want to include two of the fields which are region co ordinates as a single column under location. So can you please explain how it can be done.?

    1. Hi, you would have to do a map to transform the data in the table and add the two fields to a single attribute such as StructType in Spark SQL: “location”:{“coordinates”:[-73.856077,40.848447],”type”:”Point”}

      For using Spark with MongoDB, use the Data Source of Mongodb for Spark: https://github.com/Stratio/Spark-MongoDB

Leave a Reply

Your email address will not be published. Required fields are marked *