Quantcast
Viewing latest article 5
Browse Latest Browse All 10

Sqoop : Hadoop : Other Sqoop capabilities explored

In the last article Sqoop : Hadoop : Importing data from RDBMS to HDFS we explored basic feature of importing the data in HDFS using Sqoop. In this article we will try to explore other very important tools that Sqoop provides.

Image may be NSFW.
Clik here to view.
Data Export using Sqoop
Sqoop Provides Following features to us :

  • Data Import to HDFS – Single as well as all tables from DB
  • Data Export from HDFS to RDBMS
  • Merging results from consequent imports
  • Listing Databases and Tables
  • Running Sqoop Metastore
  • Working with saved jobs
  • Evaluating potential queries.

We will try to explore Some of these features .

Prerequisites :

Squoop 1.4.1 and Hadoop 1.0.1 but be installed and configured properly. You can get the instructions here

We will create the sqoop configuration file without the import command for this article so that we can use any command we want. So the example configuration file (/home/hduser/bigdatadboptions.txt) will look like -

–connect
jdbc:mysql://10.225.65.158:3306/bigdata
–username
root
–password
test

1. Data Import from RDBMS to HDFS

  • Single table import

We Explored this ingle table dataimport in the last article . You can read the article here.

  • Multiple Table import

For importing data from all tables in Database lets have a database with 2 tables “employee” and “skills” with some data in it. We can import all tables from database bigdata using following command.

../bin/sqoop import-all-tables –options-file /home/hduser/bigdatadboptions.txt

After running this Sqoop spawns 2 mapreduce jobs to import both the tables . We can verify this by listing hdfs.

[hduser@slave1 importtemps]$ hadoop dfs -ls /user/hduser
Found 2 items

drwxr-xr-x – hduser supergroup 0 2012-08-09 05:53 /user/hduser/employee
drwxr-xr-x – hduser supergroup 0 2012-08-09 05:54 /user/hduser/skills

2. Data Export

For this lets assume we have a mysql table called skills with following fields:

+————+————-+——+—–+———+——-+
| Field | Type | Null | Key | Default | Extra |
+————+————-+——+—–+———+——-+
| skill_id | int(11) | NO | PRI | NULL | |
| skill_name | varchar(50) | NO | | NULL | |
+————+————-+——+—–+———+——-+

Also we have a file in hdfs at /user/hduser/skills with following 2 comma separated records

1,Java
2,Hadoop

Now we will run the export command as following :

../bin/sqoop export –options-file /home/hduser/bigdatadboptions.txt –table skills –export-dir /user/hduser/skills

As a result sqoop spawned mapreduce and put both the records in the skills table. The verified output from skills table is as follows:

+———-+————+
| skill_id | skill_name |
+———-+————+
| 1 | Java |
| 2 | Hadoop |
+———-+————+

3. Listing databases and tables

Listing of databases in RDBMS and sqoop also can be achieved using following commands

  • Listing Databases

../bin/sqoop list-databases –options-file /home/hduser/bigdatadboptions.txt

  • Listing Tables in a Database

../bin/sqoop list-tables –options-file /home/hduser/bigdatadboptions.txt

This is in no way a very comprehensive post about capabilities of sqoop. It would however give you basic understanding of most important features of sqoop and how to work with them.

A more comprehensive sqoop documentation can be found here http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html

Thanks and Happy Coding


Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.

Viewing latest article 5
Browse Latest Browse All 10

Trending Articles