In the last article Sqoop : Hadoop : Importing data from RDBMS to HDFS we explored basic feature of importing the data in HDFS using Sqoop. In this article we will try to explore other very important tools that Sqoop provides.
Image may be NSFW.
Clik here to view.Sqoop Provides Following features to us :
- Data Import to HDFS – Single as well as all tables from DB
- Data Export from HDFS to RDBMS
- Merging results from consequent imports
- Listing Databases and Tables
- Running Sqoop Metastore
- Working with saved jobs
- Evaluating potential queries.
We will try to explore Some of these features .
Prerequisites :
Squoop 1.4.1 and Hadoop 1.0.1 but be installed and configured properly. You can get the instructions here
We will create the sqoop configuration file without the import command for this article so that we can use any command we want. So the example configuration file (/home/hduser/bigdatadboptions.txt) will look like -
–connect
jdbc:mysql://10.225.65.158:3306/bigdata
–username
root
–password
test
1. Data Import from RDBMS to HDFS
- Single table import
We Explored this ingle table dataimport in the last article . You can read the article here.
- Multiple Table import
For importing data from all tables in Database lets have a database with 2 tables “employee” and “skills” with some data in it. We can import all tables from database bigdata using following command.
../bin/sqoop import-all-tables –options-file /home/hduser/bigdatadboptions.txt
After running this Sqoop spawns 2 mapreduce jobs to import both the tables . We can verify this by listing hdfs.
[hduser@slave1 importtemps]$ hadoop dfs -ls /user/hduser
Found 2 itemsdrwxr-xr-x – hduser supergroup 0 2012-08-09 05:53 /user/hduser/employee
drwxr-xr-x – hduser supergroup 0 2012-08-09 05:54 /user/hduser/skills
2. Data Export
For this lets assume we have a mysql table called skills with following fields:
+————+————-+——+—–+———+——-+
| Field | Type | Null | Key | Default | Extra |
+————+————-+——+—–+———+——-+
| skill_id | int(11) | NO | PRI | NULL | |
| skill_name | varchar(50) | NO | | NULL | |
+————+————-+——+—–+———+——-+
Also we have a file in hdfs at /user/hduser/skills with following 2 comma separated records
1,Java
2,Hadoop
Now we will run the export command as following :
../bin/sqoop export –options-file /home/hduser/bigdatadboptions.txt –table skills –export-dir /user/hduser/skills
As a result sqoop spawned mapreduce and put both the records in the skills table. The verified output from skills table is as follows:
+———-+————+
| skill_id | skill_name |
+———-+————+
| 1 | Java |
| 2 | Hadoop |
+———-+————+
3. Listing databases and tables
Listing of databases in RDBMS and sqoop also can be achieved using following commands
- Listing Databases
../bin/sqoop list-databases –options-file /home/hduser/bigdatadboptions.txt
- Listing Tables in a Database
../bin/sqoop list-tables –options-file /home/hduser/bigdatadboptions.txt
This is in no way a very comprehensive post about capabilities of sqoop. It would however give you basic understanding of most important features of sqoop and how to work with them.
A more comprehensive sqoop documentation can be found here http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html
Thanks and Happy Coding
Image may be NSFW.
Clik here to view.
Clik here to view.
