0% found this document useful (0 votes)

72 views

Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data

This document discusses Apache Sqoop and its basic commands for importing and exporting data between relational databases and Hadoop. It provides examples of using Sqoop to import data from a MySQL database to HDFS, Hive tables, and HBase. It also demonstrates exporting data from HDFS or Hive to a MySQL database. The document lists arguments for incremental imports and other Sqoop functions like codegen, eval, and writing Sqoop jobs.

Uploaded by

Syed Azam Shah

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views

Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data

Uploaded by

Syed Azam Shah

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

4/5/2020 Knowledge about Apache Sqoop and its all basic commands to import and export the Data

Big Data!! Simple Solutions!!!

Knowledge about Apache Sqoop and its

all basic commands to import and export
the Data
November 28, 2019

Sqoop is a command-line interface application for transferring

data between Relational Databases and Hadoop.

We can say that Sqoop is a connector between RDBMS to

Hadoop (Import) or Hadoop to RDBMS(Export).

Options while importing data to Hadoop:

. Importing table data from RDBMS table to HDFS(file
system)
. Importing table data from RDBMS table to Hive table
. Importing table data from RDBMS table to HBase

Options while exporting data to RDBMS:

. Exporting data from HDFS(file system) to RDBMS table
. Export data from Hadoop (Hive) to RDBMS table

List all databases in MySQL

sqoop-list-databases --connect jdbc:mysql://localhost --
username hadoop --password hadoop

List all tables in MySQL

sqoop-list-tables --connect jdbc:mysql://localhost/hive --
username hadoop --P

Note : If we pass -P as parameter, we can give the password in

the run time so that the password is not hard-coded for security
reasons.

Pass parameter file to Sqoop

sqoop-list-tables --options-file
/root/Anamika_Singh/sqoop_param
cat>sqoop_param
--connect
jdbc:mysql://localhost/hive
--username
hadoop
password
https://bigdatasolutionss.blogspot.com/2019/11/knowledge-about-apache-sqoop-and-its.html 1/7
4/5/2020 Knowledge about Apache Sqoop and its all basic commands to import and export the Data
--password
hadoop

Import table data to HDFS (O/P file will be by default

delimited text)
sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--table Employee -m 1

If there is no primary key in the table, then we need to specify

number of mappers as 1 i.e., sequential import of data. we can
explicitly specify number of mappers for parallel import.
If there is no primary key in the table & if we need parallel
import then, use split-by "some column name" & can specify
any numbers of mappers. Those many part-m files will be
generated.

By default if there is primary key in a table or using split-

by, it uses 4 mappers. We can see 4 mappers output in
HDFS.

sqoop-import --options-file /root/Anamika_Singh/sqoop_param

--table Employee --split-by EDept

Import table data to HDFS (Import only specific columns)

sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--table Employee -m 1 --columns "ENo,ESal"

Import table data to HDFS (Tab separated file format)

sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--fields-terminated-by '\t' --table Employee -m 1

Import table data to HDFS (Save to target directory)

sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--table Employee -m 1 --target-dir /user/Employee

Import table data to HDFS (Where condition)

sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--table Employee -m 1 --where "ESal > 50000"

Import table data to HDFS (Where condition)

sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--table Employee --split-by EDept

Import table data to Hive (Create table & load data)

sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--table Employee -m 1 --hive-import --create-hive-table

Import table data to Hive (table already exists, Only load

data)
sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--table Employee -m 1 --hive-import --hive-table emp hive
https://bigdatasolutionss.blogspot.com/2019/11/knowledge-about-apache-sqoop-and-its.html 2/7
4/5/2020
p y p p_
Knowledge about Apache Sqoop and its all basic commands to import and export the Data

Import table data to Hive (By passing Partition Key &

Values)
sqoop-import --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera --hive-import --hive-table
orders1 --hive-partition-key order_status --hive-partition-value
"CLOSED" --query 'select order_id, order_date,
order_customer_id from orders1 where
order_status="CLOSED" and $CONDITIONS' --hive-
overwrite --create-hive-table --delete-target-dir -m 1 --hive-
database Anamika_Singh --target-dir
/user/hive/warehouse/Anamika_Singh.db/orders;

sqoop-import --connect jdbc:mysql://localhost/retail_db --

username root --password cloudera --hive-import --hive-table
orders1 --hive-partition-key order_status --hive-partition-value
"CLOSED" --query 'select order_id, order_date,
order_customer_id from orders1 where
order_status="CLOSED" and $CONDITIONS' --delete-target-
dir -m 1 --hive-database Anamika_Singh --target-dir
/user/hive/warehouse/Anamika_Singh.db/orders;

Arguements:
--create-hive-table Fail if the target hive table
exists
--hive-import Import tables into Hive
--hive-overwrite Overwrite existing data in
the Hive table
--hive-table <table-name> Sets the table name to use
when importing to Hive
--hive-database <database-name> Sets the database name to
use when importing to Hive
--delete-target-dir To delete target directory
in default path
--exclude-tables To exclude some tables
from import

--import-all-tables:
For the import-all-tables tool to be useful, the following
conditions must be met:
1. Each table must have a single-column primary key(Should
not have more than 1 primary key)
2. You must intend to import all columns of each table.
3. You must not intend to use WHERE clause.

sqoop import-all-tables --connect

jdbc:mysql://localhost/retail_db --username root --password
cloudera --warehouse-dir
/user/hive/warehouse/Anamika_Singh.db -m 1 --exclude-tables
orders1,orders temp;
https://bigdatasolutionss.blogspot.com/2019/11/knowledge-about-apache-sqoop-and-its.html 3/7
4/5/2020 Knowledge about Apache Sqoop and its all basic commands to import and export the Data
orders1,orders_temp;
This tool imports a set of tables from an RDBMS to HDFS.
Data from each table is stored in a separate directory in HDFS.

--incremental data load to hive table with last value

sqoop import --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera --hive-database
Anamika_Singh --hive-table orders1 --incremental append --
check-column order_id --last-value 22 -m 1 --hive-import --
table orders1;

--incremental data load to ive table with last modified date

sqoop import --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera --hive-import --hive-table
orders1 --hive-database Anamika_Singh --incremental
lastmodified --check-column order_date --last-value '2013-07-
25 00:00:08' -m 1 --table orders1 --append;

--Export data from Hadoop (Hive) to RDBMS table

sqoop-export --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera --table emp --export-dir
'/user/hive/warehouse/Anamika_Singh.db/emp_bk' --input-null-
string '\\N' --input-null-non-string '\\N';

Export Failure Cases:

Exports may fail for a number of reasons:
1. Loss of connectivity from the Hadoop cluster to the database
(either due to hardware fault, or server software crashes)
2. Attempting to INSERT a row which violates constraints (for
example, inserting a duplicate primary key value)
3. Attempting to parse records using incorrect delimiters
4. Capacity issues (such as insufficient RAM or disk space)

codegen generates the .java, .class & .jar file for the sqoop
job executed
sqoop codegen --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera --table ordr;

eval prints the output of the query on the screen

sqoop eval --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera --query "select * from emp
where deptno=20";

Writing a sqoop Job

sqoop job --create myjob \
--import \
--connect jdbc:mysql://localhost/db \
--username root \
--table employee --m 1
sqoop job --list
sqoop job show myjob
https://bigdatasolutionss.blogspot.com/2019/11/knowledge-about-apache-sqoop-and-its.html 4/7
4/5/2020 Knowledge about Apache Sqoop and its all basic commands to import and export the Data
sqoop job --show myjob
sqoop job --exec myjob
sqoop job –delete myjob;

Sqoop Incremental Job :

sqoop job \
--create myjob \
--import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password cloudera \
--incremental append \
--check-column order_id \
--last-value 0 \
--hive-import \
--table orders1;

sqoop job -list;

sqoop job --exec myjob;
sqoop job -delete myjob;

Importing data from RDBMS table to HBase :

1 . To Existing Hbase Table(Only Load)

sqoop import --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera –table orders –hbase-table
orders_hbase –column-family cf1 –hbase-row-key order_id

2 . To new Hbase Table (Create & Load)

sqoop import --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera –table orders –hbase-table
orders_hbase –hbase-create-table –column-family cf1 –hbase-
row-key order_id

Roopa November 29, 2019 at 4:04 AM

This is really helpful ma'am.. I don't have to make any

notes to glance when I have doubts.. This is perfect.
Thank you!!

Anamika Singh November 29, 2019 at 6:17 AM

Thanks Roopa.

https://bigdatasolutionss.blogspot.com/2019/11/knowledge-about-apache-sqoop-and-its.html 5/7
4/5/2020 Knowledge about Apache Sqoop and its all basic commands to import and export the Data

Enter your comment...

groupByKey vs reduceByKey vs aggregateByKey in

Apache Spark/Scala
September 07, 2019

Hello Friends,

Today I would like to write about groupByKey vs …

Performance Tuning in Apache Spark

August 02, 2019

Performance Tuning in Apache Spark :

The process of adjusting settings to record for
memory, cores, and all the instances used by the …

https://bigdatasolutionss.blogspot.com/2019/11/knowledge-about-apache-sqoop-and-its.html 6/7
4/5/2020 Knowledge about Apache Sqoop and its all basic commands to import and export the Data

Theme images by Michael Elkan

ANAMIKA SINGH

Anamika Singh is a Big Data Specialist /

Technical Lead in a reputed MNC,
Bangalore.

VISIT PROFILE

Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data

Uploaded by

Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data

Uploaded by

4/5/2020 Knowledge about Apache Sqoop and its all basic commands to import and export the Data

Big Data!! Simple Solutions!!!

Knowledge about Apache Sqoop and its

Sqoop is a command-line interface application for transferring

We can say that Sqoop is a connector between RDBMS to

Options while importing data to Hadoop:

Options while exporting data to RDBMS:

List all databases in MySQL

List all tables in MySQL

Note : If we pass -P as parameter, we can give the password in

Pass parameter file to Sqoop

Import table data to HDFS (O/P file will be by default

If there is no primary key in the table, then we need to specify

By default if there is primary key in a table or using split-

sqoop-import --options-file /root/Anamika_Singh/sqoop_param

Import table data to HDFS (Import only specific columns)

Import table data to HDFS (Tab separated file format)

Import table data to HDFS (Save to target directory)

Import table data to HDFS (Where condition)

Import table data to HDFS (Where condition)

Import table data to Hive (Create table & load data)

Import table data to Hive (table already exists, Only load

Import table data to Hive (By passing Partition Key &

sqoop-import --connect jdbc:mysql://localhost/retail_db --

sqoop import-all-tables --connect

--incremental data load to hive table with last value

--incremental data load to ive table with last modified date

--Export data from Hadoop (Hive) to RDBMS table

Export Failure Cases:

eval prints the output of the query on the screen

Writing a sqoop Job

Sqoop Incremental Job :

sqoop job -list;

Importing data from RDBMS table to HBase :

1 . To Existing Hbase Table(Only Load)

2 . To new Hbase Table (Create & Load)

Roopa November 29, 2019 at 4:04 AM

This is really helpful ma'am.. I don't have to make any

Anamika Singh November 29, 2019 at 6:17 AM

Enter your comment...

Popular posts from this blog

groupByKey vs reduceByKey vs aggregateByKey in

Today I would like to write about groupByKey vs …

Performance Tuning in Apache Spark

Performance Tuning in Apache Spark :

Theme images by Michael Elkan

Anamika Singh is a Big Data Specialist /

You might also like