0% found this document useful (0 votes)
72 views

Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data

This document discusses Apache Sqoop and its basic commands for importing and exporting data between relational databases and Hadoop. It provides examples of using Sqoop to import data from a MySQL database to HDFS, Hive tables, and HBase. It also demonstrates exporting data from HDFS or Hive to a MySQL database. The document lists arguments for incremental imports and other Sqoop functions like codegen, eval, and writing Sqoop jobs.

Uploaded by

Syed Azam Shah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data

This document discusses Apache Sqoop and its basic commands for importing and exporting data between relational databases and Hadoop. It provides examples of using Sqoop to import data from a MySQL database to HDFS, Hive tables, and HBase. It also demonstrates exporting data from HDFS or Hive to a MySQL database. The document lists arguments for incremental imports and other Sqoop functions like codegen, eval, and writing Sqoop jobs.

Uploaded by

Syed Azam Shah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

4/5/2020 Knowledge about Apache Sqoop and its all basic commands to import and export the Data

Big Data!! Simple Solutions!!!

Knowledge about Apache Sqoop and its


all basic commands to import and export
the Data
November 28, 2019

Sqoop is a command-line interface application for transferring


data between Relational Databases and Hadoop.

We can say that Sqoop is a connector between RDBMS to


Hadoop (Import) or Hadoop to RDBMS(Export).

Options while importing data to Hadoop:


. Importing table data from RDBMS table to HDFS(file
system)
. Importing table data from RDBMS table to Hive table
. Importing table data from RDBMS table to HBase

Options while exporting data to RDBMS:


. Exporting data from HDFS(file system) to RDBMS table
. Export data from Hadoop (Hive) to RDBMS table

List all databases in MySQL


sqoop-list-databases --connect jdbc:mysql://localhost --
username hadoop --password hadoop

List all tables in MySQL


sqoop-list-tables --connect jdbc:mysql://localhost/hive --
username hadoop --P

Note : If we pass -P as parameter, we can give the password in


the run time so that the password is not hard-coded for security
reasons.

Pass parameter file to Sqoop


sqoop-list-tables --options-file
/root/Anamika_Singh/sqoop_param
cat>sqoop_param
--connect
jdbc:mysql://localhost/hive
--username
hadoop
password
https://bigdatasolutionss.blogspot.com/2019/11/knowledge-about-apache-sqoop-and-its.html 1/7
4/5/2020 Knowledge about Apache Sqoop and its all basic commands to import and export the Data
--password
hadoop

Import table data to HDFS (O/P file will be by default


delimited text)
sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--table Employee -m 1

If there is no primary key in the table, then we need to specify


number of mappers as 1 i.e., sequential import of data. we can
explicitly specify number of mappers for parallel import.
If there is no primary key in the table & if we need parallel
import then, use split-by "some column name" & can specify
any numbers of mappers. Those many part-m files will be
generated.

By default if there is primary key in a table or using split-


by, it uses 4 mappers. We can see 4 mappers output in
HDFS.

sqoop-import --options-file /root/Anamika_Singh/sqoop_param


--table Employee --split-by EDept

Import table data to HDFS (Import only specific columns)


sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--table Employee -m 1 --columns "ENo,ESal"

Import table data to HDFS (Tab separated file format)


sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--fields-terminated-by '\t' --table Employee -m 1

Import table data to HDFS (Save to target directory)


sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--table Employee -m 1 --target-dir /user/Employee

Import table data to HDFS (Where condition)


sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--table Employee -m 1 --where "ESal > 50000"

Import table data to HDFS (Where condition)


sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--table Employee --split-by EDept

Import table data to Hive (Create table & load data)


sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--table Employee -m 1 --hive-import --create-hive-table

Import table data to Hive (table already exists, Only load


data)
sqoop-import --options-file /root/Anamika_Singh/sqoop_param
--table Employee -m 1 --hive-import --hive-table emp hive
https://bigdatasolutionss.blogspot.com/2019/11/knowledge-about-apache-sqoop-and-its.html 2/7
4/5/2020
p y p p_
Knowledge about Apache Sqoop and its all basic commands to import and export the Data

Import table data to Hive (By passing Partition Key &


Values)
sqoop-import --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera --hive-import --hive-table
orders1 --hive-partition-key order_status --hive-partition-value
"CLOSED" --query 'select order_id, order_date,
order_customer_id from orders1 where
order_status="CLOSED" and $CONDITIONS' --hive-
overwrite --create-hive-table --delete-target-dir -m 1 --hive-
database Anamika_Singh --target-dir
/user/hive/warehouse/Anamika_Singh.db/orders;

sqoop-import --connect jdbc:mysql://localhost/retail_db --


username root --password cloudera --hive-import --hive-table
orders1 --hive-partition-key order_status --hive-partition-value
"CLOSED" --query 'select order_id, order_date,
order_customer_id from orders1 where
order_status="CLOSED" and $CONDITIONS' --delete-target-
dir -m 1 --hive-database Anamika_Singh --target-dir
/user/hive/warehouse/Anamika_Singh.db/orders;

Arguements:
--create-hive-table Fail if the target hive table
exists
--hive-import Import tables into Hive
--hive-overwrite Overwrite existing data in
the Hive table
--hive-table <table-name> Sets the table name to use
when importing to Hive
--hive-database <database-name> Sets the database name to
use when importing to Hive
--delete-target-dir To delete target directory
in default path
--exclude-tables To exclude some tables
from import

--import-all-tables:
For the import-all-tables tool to be useful, the following
conditions must be met:
1. Each table must have a single-column primary key(Should
not have more than 1 primary key)
2. You must intend to import all columns of each table.
3. You must not intend to use WHERE clause.

sqoop import-all-tables --connect


jdbc:mysql://localhost/retail_db --username root --password
cloudera --warehouse-dir
/user/hive/warehouse/Anamika_Singh.db -m 1 --exclude-tables
orders1,orders temp;
https://bigdatasolutionss.blogspot.com/2019/11/knowledge-about-apache-sqoop-and-its.html 3/7
4/5/2020 Knowledge about Apache Sqoop and its all basic commands to import and export the Data
orders1,orders_temp;
This tool imports a set of tables from an RDBMS to HDFS.
Data from each table is stored in a separate directory in HDFS.

--incremental data load to hive table with last value


sqoop import --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera --hive-database
Anamika_Singh --hive-table orders1 --incremental append --
check-column order_id --last-value 22 -m 1 --hive-import --
table orders1;

--incremental data load to ive table with last modified date


sqoop import --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera --hive-import --hive-table
orders1 --hive-database Anamika_Singh --incremental
lastmodified --check-column order_date --last-value '2013-07-
25 00:00:08' -m 1 --table orders1 --append;

--Export data from Hadoop (Hive) to RDBMS table


sqoop-export --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera --table emp --export-dir
'/user/hive/warehouse/Anamika_Singh.db/emp_bk' --input-null-
string '\\N' --input-null-non-string '\\N';

Export Failure Cases:


Exports may fail for a number of reasons:
1. Loss of connectivity from the Hadoop cluster to the database
(either due to hardware fault, or server software crashes)
2. Attempting to INSERT a row which violates constraints (for
example, inserting a duplicate primary key value)
3. Attempting to parse records using incorrect delimiters
4. Capacity issues (such as insufficient RAM or disk space)

codegen generates the .java, .class & .jar file for the sqoop
job executed
sqoop codegen --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera --table ordr;

eval prints the output of the query on the screen


sqoop eval --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera --query "select * from emp
where deptno=20";

Writing a sqoop Job


sqoop job --create myjob \
--import \
--connect jdbc:mysql://localhost/db \
--username root \
--table employee --m 1
sqoop job --list
sqoop job show myjob
https://bigdatasolutionss.blogspot.com/2019/11/knowledge-about-apache-sqoop-and-its.html 4/7
4/5/2020 Knowledge about Apache Sqoop and its all basic commands to import and export the Data
sqoop job --show myjob
sqoop job --exec myjob
sqoop job –delete myjob;

Sqoop Incremental Job :


sqoop job \
--create myjob \
--import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password cloudera \
--incremental append \
--check-column order_id \
--last-value 0 \
--hive-import \
--table orders1;

sqoop job -list;


sqoop job --exec myjob;
sqoop job -delete myjob;

Importing data from RDBMS table to HBase :

1 . To Existing Hbase Table(Only Load)


sqoop import --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera –table orders –hbase-table
orders_hbase –column-family cf1 –hbase-row-key order_id

2 . To new Hbase Table (Create & Load)


sqoop import --connect jdbc:mysql://localhost/retail_db --
username root --password cloudera –table orders –hbase-table
orders_hbase –hbase-create-table –column-family cf1 –hbase-
row-key order_id

Roopa November 29, 2019 at 4:04 AM

This is really helpful ma'am.. I don't have to make any


notes to glance when I have doubts.. This is perfect.
Thank you!!

Anamika Singh November 29, 2019 at 6:17 AM


Thanks Roopa.

REPLY

https://bigdatasolutionss.blogspot.com/2019/11/knowledge-about-apache-sqoop-and-its.html 5/7
4/5/2020 Knowledge about Apache Sqoop and its all basic commands to import and export the Data

Enter your comment...

Popular posts from this blog

groupByKey vs reduceByKey vs aggregateByKey in


Apache Spark/Scala
September 07, 2019

Hello Friends,

Today I would like to write about groupByKey vs …

READ MORE

Performance Tuning in Apache Spark


August 02, 2019

Performance Tuning in Apache Spark :


The process of adjusting settings to record for
memory, cores, and all the instances used by the …

READ MORE

https://bigdatasolutionss.blogspot.com/2019/11/knowledge-about-apache-sqoop-and-its.html 6/7
4/5/2020 Knowledge about Apache Sqoop and its all basic commands to import and export the Data

Powered by Blogger

Theme images by Michael Elkan

ANAMIKA SINGH

Anamika Singh is a Big Data Specialist /


Technical Lead in a reputed MNC,
Bangalore.

VISIT PROFILE

Archive

Report Abuse

https://bigdatasolutionss.blogspot.com/2019/11/knowledge-about-apache-sqoop-and-its.html 7/7

You might also like