Sqoop is a tool designed to transfer data between Hadoop and RDBMS .Sqoop is an open source software product of the Apache Software Foundation. You can use sqoop to import and export data between RDBMS(Oracle or mysql) to Hadoop which provides parallel operation.
Sqoop provides full, incremental, online, offline, distribution and bulk extraction. It allows import and export of Peta Bytes of data. Sqoop is available in few places. The primary home of the project is sqoop.apache.org. It creates MapReduce job behinds the scene to import data from an external database to HDFS. It is very effective and simple to learn tool for non programmers.
Following are the benefits it offers:
• Reusability: Sqoop functionality can be reused across multiple data sources as it is an API.
• Extensibility: Sqoop is extensible as it is an open source.
• Fault tolerance: Sqoop uses Map Reduce to import and export the data, which provides parallel operation. The solution would be leveraging oozie to rerun failed jobs.
Sqoop work with map only it won’t work with reducer and default mappers are 4.
Before importing the data from RDBMS we should connect the database (default dir:/user/home/)
$ sqoop import –connect jdbc:mysql://localhost(database url)/tablename \
–Username xxxxxx –password 12345;
Grant permissions to user:
Grant permissions to the user to access the database
Grant all on dbname.* to ‘user@localhost’ identified by ‘user’;
Target Dir changing:
• sqoop import –connect jdbc:mysql://localhost/naveen –table tablename –username xxxx –password 12345 –split-by year –target-dir /tmp -m 6 ;
(m6 is executing 6 mappers manually )
• sqoop import –connect jdbc:mysql://localhost/naveen –table naveen_test –username training –password training; (if you have primmary key)
• sqoop import –connect jdbc:mysql://localhost/naveen –table naveen_test –username training –password training –split-by sno(if you don’t have primary key )
Exporting table into RBDMS:
sqoop export \
–connect jdbc:mysql://localhost/test \
–table employees_export \
–staging-table employees_exp_stg \
-m 4 \