Sqoop is known as “SQL-to-Hadoop”. Sqoop is a tool used to transfer Data between relational databases and Hadoop. It has a connector based architecture that supports plugins that provide connectivity to new external systems.
Features of Sqoop:
- Import whole tables, or whole Schema, from relational databases into Hadoop’s file system, HDFS.
- Export data from HDFS to these relational databases. The export and import being performed through MapReduce jobs.
- Import using a particular SQL SELECT statement, rather than importing whole tables.
- Perform incremental loads, specifying a key column to determine what to exclude.
- Load directly into Hive tables, creating HDFS files in the background.
What all thing needed to start with it :
Sqoop ,Hive ,Mysql installed on your machine.
For this tutorial, I have used Cloudera Quickstart CDH version 5.3 which comes with Sqoop ,MySQL and Hive installed.
Let’s get started :
Step 1. If you already have tables in MySql, Go to Step 2.
Connect to MySql with username and password. Once connected create the table in Mysql database.
Populate your table with some data.
Verify your table.
Step 2. Open Hive. Go to terminal type Hive ,don’t forget to press Enter.
Create database to which you want to import MySql table.
Initial there will be no tables in your DB. Here we have created database “retaildb”. To check there are any tables in DB type -“show tables;”.
Step 3. For Sqoop, we have to go to terminal/command line. Run Following Sqoop command in command line, to import the table from Mysql to Hive ( Hadoop Datawarehouse) for further analysis.
sqoop import \
–connect jdbc:mysql://quickstart.cloudera:3306/retail \
–username root -P -m 1 \
–table books \
–hive-database retaildb \
–hive-table books \
–fields-terminated-by ‘\t’ \
Starting to transfer : After pressing Enter, you can see the logs.
For few minutes logs will keep generating showing the various status of loading data into Hadoop environment.At the last logs seems to be like this – “Time Taken” therefore completed.
Step 4. Verify if Hive tables have been imported properly with data from MySql or not.
Now there is the table in Hive database and Data also. We are ready to analysis data in Hadoop and availble to user for query through hadoop.
Thanks for reading !!