SlideShare a Scribd company logo
Integrating Hadoop with Enterprise
RDBMS Using Apache SQOOP and
Other Tools


Guy Harrison, Quest Software




                               ©2011 Quest Software, Inc. All rights reserved..
Introductions




                                                                   1

                ©2011 Quest Software, Inc. All rights reserved..
2

©2011 Quest Software, Inc. All rights reserved..
3
Agenda


• Scenarios for RDBMS-Hadoop interaction
• Case study: Quest extension to SQOOP
• Other RDBMS-Hadoop integrations




                                                                                            4

                                         ©2011 Quest Software, Inc. All rights reserved..
Hadoop meets RDBMS – scenarios




                                                                             5

                          ©2011 Quest Software, Inc. All rights reserved..
Scenario #1: Reference data in RDBMS




                                        PRODUCTS


                                        CUSTOMERS


            HDFS

 WEBlOGS
                                       RDBMS
Scenario #2: Hadoop for off-line analytics




                                              PRODUCTS


                                              CUSTOMERS


            HDFS                                SALES
                                               HISTORY



                                             RDBMS
Scenario #3: MapReduce output to RDBMS

                                         DB QUERY
                                              TOOL




                                         WEBLOGS
                                         SUMMARY




           HDFS

 WEBlOGS
                                     RDBMS
Scenario #4: Hadoop as RDBMS “active archive”
                                         QUERY
                                              TOOL




                                         SALES 2011

                                         SALES 2010
                   SALES 2009            SALES 2009
                   SALES 2008            SALES 2008

           HDFS

                                       RDBMS
Case Study: extending SQOOP for Oracle




                                                                                  10

                               ©2011 Quest Software, Inc. All rights reserved..
SQOOP extensibility
• SQOOP implements a generic approach to
  RDBMS/Hadoop data transfer
• But database optimization is highly platform specific
 • Each RDBMS has distinct optimizations strategies


• For Oracle, optimization requires:
 • Bypassing Oracle caching layers
 • Avoiding Oracle optimizer meddling
 • Exploiting Oracle metadata to balance mapper load




                                                                                                          11

                                                       ©2011 Quest Software, Inc. All rights reserved..
Reading from Oracle – default SQOOP

        ID > 0 and ID < MAX/2                                   ID > MAX/2

           MAPPER                                                 MAPPER




                                CACHE

     ORACLE SESSSION                                           ORACLE SESSION




   RANGE SCAN                                             RANGE SCAN

 Index block     Index block       Index block   Index block        Index block   Index block




                                ORACLE TABLE
Oracle – parallelism gone bad (1)



                  Hadoop Mapper




                  Hadoop Mapper

                                    Oracle
   HDFS                              SALES
                                     table


                  Hadoop Mapper




                  Hadoop Mapper
Oracle – parallelism gone bad (2)


                   HADOOP
                   MAPPER




                    HADOOP
                    MAPPER

   HDFS                             ORACLE
                                     TABLE

                    HADOOP
                    MAPPER




                    HADOOP
                    MAPPER
Ideal architecture



                     HADOOP   ORACLE
                     MAPPER   SESSION




                     HADOOP   ORACLE
                     MAPPER   SESSION


   HDFS                                 ORACLE
                                         TABLE

                     HADOOP   ORACLE
                     MAPPER   SESSION




                     HADOOP   ORACLE
                     MAPPER   SESSION
Design goals


• Partition data based on physical storage
• By-pass Oracle buffering
• By-pass Oracle parallelism
• Do not require or use indexes
• Never read the same data block more than once
• Support Oracle datatypes




                                                                                                 16

                                              ©2011 Quest Software, Inc. All rights reserved..
Import Throughput
                    7,000

                                                             SQOOP

                    6,000                                    SQOOP with Quest Connector




                    5,000
Elapsed Time (ms)




                    4,000



                    3,000



                    2,000



                    1,000



                       0
                            0   5   10   15            20        25                          30                          35
                                         Number of mappers


                                                                                                                              17

                                                                      ©2011 Quest Software, Inc. All rights reserved..
16 mappers, 50M rows, 50 GB clustered data

           IO time                                                                           98.71




       IO requests                                                                            99.08




Network round trips                                                                          98.95




         CPU Time                                                             89.72




     Elasped time                                               80.84



                      0     20        40                   60         80                                100
                                           Pct reduction



                                                                                                                   18

                                                                ©2011 Quest Software, Inc. All rights reserved..
Export Throughput
           3,000
                                                     SQOOP
                                                     SQOOP with Quest Connect

           2,500




           2,000
 Seconds




           1,500




           1,000




            500
                   0   5   10                   15                   20                                     25
                                No of mappers



                                                                                                                 19

                                                         ©2011 Quest Software, Inc. All rights reserved..
Export load
                      30000

                                  SQOOP
                                  SQOOP with Quest connect
                      25000




                      20000
  Database time (s)




                      15000




                      10000




                       5000




                          0
                              0         5               10        15         20                      25                              30
                                                             No of mappers



                                                                                                                                          20

                                                                                  ©2011 Quest Software, Inc. All rights reserved..
Working with the SQOOP framework
• SQOOP lets you concentrate on the RDBMS logic, not
 the Hadoop plumbing:
 • Extend ManagerFactory (what to handle)
 • Extend ConnManager (DB connection and metadata)
 • For imports:
  • Extend DataDrivenDBInputFormat (gets the data)
    • Data allocation (getSplits())
    • Split serialization (“io.serializations” property)
    • Data access logic (createDBRecordReader(), getSelectQuery())
      • Implement progress (nextKeyValue(), getProgress())

 • Similar procedure for extending exports




                                                                                                                    21

                                                                 ©2011 Quest Software, Inc. All rights reserved..
Extensions to native SQOOP
• MERGE
  functionality
 • Update if
   exists, insert
   otherwise

• Hive
  connector
 • Source defined as
   HQL query rather
   than HDFS
   directory

• Eclipse UI
                                                                                22

                             ©2011 Quest Software, Inc. All rights reserved..
Availability
• Apache licensed source available from :
 https://github.com/QuestSoftwareTCD/OracleSQOOPconnector

• Download from (Quest):
 http://www.quest.com/hadoop/

• Download from (Cloudera):
 http://ccp.cloudera.com/display/SUPPORT/Downloads




                                                                                                        23

                                                     ©2011 Quest Software, Inc. All rights reserved..
Other SQOOP connectors
• Microsoft SQL Server:
 • http://www.microsoft.com/download/en/details.aspx?id=27584

• Teradata:
 • https://ccp.cloudera.com/display/con/Cloudera+Connector+for+Teradata+User+Guide
   %2C+version+1.0-beta-u4

• Microstrategy:
 • https://ccp.cloudera.com/display/con/MicroStrategy+Free+Download+License+Agreem
   ent

• Nettezza:
 • https://ccp.cloudera.com/display/con/Netezza+Free+Download+License+Agreement

• VoltDB:
   • http://voltdb.com/company/blog/sqoop-voltdb-export-and-hadoop-integration




                                                                                                                   24

                                                                ©2011 Quest Software, Inc. All rights reserved..
Other Hadoop – RDBMS integrations




                                                                               25

                            ©2011 Quest Software, Inc. All rights reserved..
Oracle Big Data Appliance
• 18 Sun X4270 M2 servers
 • 48GB per node (864GB total)
 • 2x6 Core CPU per node (216 total)
 • 12x2TB HDD per node (216 spindles, 864 TB)
 • 40Gb/s Infiniband between nodes
 • 10Gb/s Ethernet to datacenter

• Apache Hadoop
• Oracle NoSQL
• Oracle loader for Hadoop
 • Multi-stage C-optimized unidirectional loader

                            www.oracle.com/us/bigdata/index.html
                                                                                                        26

                                                     ©2011 Quest Software, Inc. All rights reserved..
ORACLE               ORACLE     ORACLE
 BIG DATA            EXALOGIC   EXALYTICS
APPLIANCE
                      ORACLE
                     WEBLOGIC

 ORACLE                          ORACLE
 NOSQL                           ESSBASE


                     ORACLE
            ORACLE
                     EXADATA
            LOADER
            FOR
            HADOOP
 APACHE
                      ORACLE
 HADOOP                          ORACLE
                      RDBMS
                                TIMES TEN
Microsoft




                                                               28

            ©2011 Quest Software, Inc. All rights reserved..
Hadapt
 • Formally HadoopDB – Hadoop/Postgres hybrid
 • Postgres servers on data nodes allow for accelerated
   (indexed) HIVE queries
 • Extensions to the Hive optimizer




http://www.hadapt.com/


                                                                                             29

                                          ©2011 Quest Software, Inc. All rights reserved..
Greenplum
• SQL based access to HDFS data via in-DB MapReduce




                http://www.greenplum.com/sites/default/files/EMC_Greenplum_Hadoop_DB_TB_0.pdf

                                                                                                                    30

                                                                 ©2011 Quest Software, Inc. All rights reserved..
Toad for Cloud Databases
• Federated SQL queries across
  Hive, Hbase, NoSQL, RDBMS




                                                                                    31

                                 ©2011 Quest Software, Inc. All rights reserved..
Conclusions

• RDBMS-Hadoop interoperability is key to Enterprise
  Hadoop adoption
• SQOOP provides a good general purpose framework
  for transferring data between any JDBC database and
  Hadoop
 • We’d like to see it become a standard
• Each RDBMS offers distinct tuning opportunities, so
  optimized SQOOP extensions offer real value
• Hadoop-RDBMS integration projects are proliferating
  rapidly

                                                                                              32

                                           ©2011 Quest Software, Inc. All rights reserved..
33

©2011 Quest Software, Inc. All rights reserved..
Ad

More Related Content

What's hot (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
Jakub Stransky
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
SandeepTaksande
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Simplilearn
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
JanBask Training
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
Ayyappan Paramesh
 
Hadoop
Hadoop Hadoop
Hadoop
ABHIJEET RAJ
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBuilding a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Bradford Stephens
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
ALTEN Calsoft Labs
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Rohit Kulkarni
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
eakasit_dpu
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
EMC
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
Xuan-Chao Huang
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
Microsoft TechNet - Belgium and Luxembourg
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
Khanderao Kand
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
 
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 Let Spark Fly: Advantages and Use Cases for Spark on Hadoop Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
MapR Technologies
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
Jakub Stransky
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
SandeepTaksande
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Simplilearn
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
JanBask Training
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBuilding a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Bradford Stephens
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Rohit Kulkarni
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
eakasit_dpu
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
EMC
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
Xuan-Chao Huang
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
Khanderao Kand
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
 
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 Let Spark Fly: Advantages and Use Cases for Spark on Hadoop Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
MapR Technologies
 

Viewers also liked (20)

Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
Cloudera, Inc.
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
Guy Harrison
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other tools
Guy Harrison
 
What is difference between dbms and rdbms
What is difference between dbms and rdbmsWhat is difference between dbms and rdbms
What is difference between dbms and rdbms
Afrasiyab Haider
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
Edureka!
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop
Edureka!
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
 
Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015
Edureka!
 
Co-existence or competition - RDBMS and Hadoop
Co-existence or competition  - RDBMS and HadoopCo-existence or competition  - RDBMS and Hadoop
Co-existence or competition - RDBMS and Hadoop
Flytxt
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
Edureka!
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
Arjen de Vries
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
Kunal Gupta
 
RDBMS
RDBMSRDBMS
RDBMS
PriyangaRajaram
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
DataWorks Summit
 
Introduction to RDBMS
Introduction to RDBMSIntroduction to RDBMS
Introduction to RDBMS
Sarmad Ali
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
Avkash Chauhan
 
Relational database management system (rdbms) i
Relational database management system (rdbms) iRelational database management system (rdbms) i
Relational database management system (rdbms) i
Ravinder Kamboj
 
RDBMS.ppt
RDBMS.pptRDBMS.ppt
RDBMS.ppt
Ketan Chaoji
 
Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and Oracle
Tanel Poder
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
joshwills
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
Cloudera, Inc.
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
Guy Harrison
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other tools
Guy Harrison
 
What is difference between dbms and rdbms
What is difference between dbms and rdbmsWhat is difference between dbms and rdbms
What is difference between dbms and rdbms
Afrasiyab Haider
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
Edureka!
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop
Edureka!
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
 
Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015
Edureka!
 
Co-existence or competition - RDBMS and Hadoop
Co-existence or competition  - RDBMS and HadoopCo-existence or competition  - RDBMS and Hadoop
Co-existence or competition - RDBMS and Hadoop
Flytxt
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
Edureka!
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
Kunal Gupta
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
DataWorks Summit
 
Introduction to RDBMS
Introduction to RDBMSIntroduction to RDBMS
Introduction to RDBMS
Sarmad Ali
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
Avkash Chauhan
 
Relational database management system (rdbms) i
Relational database management system (rdbms) iRelational database management system (rdbms) i
Relational database management system (rdbms) i
Ravinder Kamboj
 
Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and Oracle
Tanel Poder
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
joshwills
 
Ad

Similar to Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison, Quest Software (20)

PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batch
boorad
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache Drill
DataWorks Summit
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
MapR Technologies
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
Swiss Big Data User Group
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
OReillyStrata
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
lucenerevolution
 
Kerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataKerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadata
Enkitec
 
Hadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry OsborneHadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry Osborne
Enkitec
 
Searching conversations with hadoop
Searching conversations with hadoopSearching conversations with hadoop
Searching conversations with hadoop
DataWorks Summit
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
MapR Technologies
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
nvvrajesh
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera, Inc.
 
Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
MapR Technologies
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
cdmaxime
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Yahoo Developer Network
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
Red_Hat_Storage
 
Apache Drill
Apache DrillApache Drill
Apache Drill
Ted Dunning
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera, Inc.
 
Net flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_finalNet flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_final
Yeounhee Lee
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
outstanding59
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batch
boorad
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache Drill
DataWorks Summit
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
MapR Technologies
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
OReillyStrata
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
lucenerevolution
 
Kerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataKerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadata
Enkitec
 
Hadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry OsborneHadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry Osborne
Enkitec
 
Searching conversations with hadoop
Searching conversations with hadoopSearching conversations with hadoop
Searching conversations with hadoop
DataWorks Summit
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
MapR Technologies
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
nvvrajesh
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera, Inc.
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
cdmaxime
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Yahoo Developer Network
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
Red_Hat_Storage
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera, Inc.
 
Net flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_finalNet flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_final
Yeounhee Lee
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
outstanding59
 
Ad

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 
Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 

Recently uploaded (20)

Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 

Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison, Quest Software

  • 1. Integrating Hadoop with Enterprise RDBMS Using Apache SQOOP and Other Tools Guy Harrison, Quest Software ©2011 Quest Software, Inc. All rights reserved..
  • 2. Introductions 1 ©2011 Quest Software, Inc. All rights reserved..
  • 3. 2 ©2011 Quest Software, Inc. All rights reserved..
  • 4. 3
  • 5. Agenda • Scenarios for RDBMS-Hadoop interaction • Case study: Quest extension to SQOOP • Other RDBMS-Hadoop integrations 4 ©2011 Quest Software, Inc. All rights reserved..
  • 6. Hadoop meets RDBMS – scenarios 5 ©2011 Quest Software, Inc. All rights reserved..
  • 7. Scenario #1: Reference data in RDBMS PRODUCTS CUSTOMERS HDFS WEBlOGS RDBMS
  • 8. Scenario #2: Hadoop for off-line analytics PRODUCTS CUSTOMERS HDFS SALES HISTORY RDBMS
  • 9. Scenario #3: MapReduce output to RDBMS DB QUERY TOOL WEBLOGS SUMMARY HDFS WEBlOGS RDBMS
  • 10. Scenario #4: Hadoop as RDBMS “active archive” QUERY TOOL SALES 2011 SALES 2010 SALES 2009 SALES 2009 SALES 2008 SALES 2008 HDFS RDBMS
  • 11. Case Study: extending SQOOP for Oracle 10 ©2011 Quest Software, Inc. All rights reserved..
  • 12. SQOOP extensibility • SQOOP implements a generic approach to RDBMS/Hadoop data transfer • But database optimization is highly platform specific • Each RDBMS has distinct optimizations strategies • For Oracle, optimization requires: • Bypassing Oracle caching layers • Avoiding Oracle optimizer meddling • Exploiting Oracle metadata to balance mapper load 11 ©2011 Quest Software, Inc. All rights reserved..
  • 13. Reading from Oracle – default SQOOP ID > 0 and ID < MAX/2 ID > MAX/2 MAPPER MAPPER CACHE ORACLE SESSSION ORACLE SESSION RANGE SCAN RANGE SCAN Index block Index block Index block Index block Index block Index block ORACLE TABLE
  • 14. Oracle – parallelism gone bad (1) Hadoop Mapper Hadoop Mapper Oracle HDFS SALES table Hadoop Mapper Hadoop Mapper
  • 15. Oracle – parallelism gone bad (2) HADOOP MAPPER HADOOP MAPPER HDFS ORACLE TABLE HADOOP MAPPER HADOOP MAPPER
  • 16. Ideal architecture HADOOP ORACLE MAPPER SESSION HADOOP ORACLE MAPPER SESSION HDFS ORACLE TABLE HADOOP ORACLE MAPPER SESSION HADOOP ORACLE MAPPER SESSION
  • 17. Design goals • Partition data based on physical storage • By-pass Oracle buffering • By-pass Oracle parallelism • Do not require or use indexes • Never read the same data block more than once • Support Oracle datatypes 16 ©2011 Quest Software, Inc. All rights reserved..
  • 18. Import Throughput 7,000 SQOOP 6,000 SQOOP with Quest Connector 5,000 Elapsed Time (ms) 4,000 3,000 2,000 1,000 0 0 5 10 15 20 25 30 35 Number of mappers 17 ©2011 Quest Software, Inc. All rights reserved..
  • 19. 16 mappers, 50M rows, 50 GB clustered data IO time 98.71 IO requests 99.08 Network round trips 98.95 CPU Time 89.72 Elasped time 80.84 0 20 40 60 80 100 Pct reduction 18 ©2011 Quest Software, Inc. All rights reserved..
  • 20. Export Throughput 3,000 SQOOP SQOOP with Quest Connect 2,500 2,000 Seconds 1,500 1,000 500 0 5 10 15 20 25 No of mappers 19 ©2011 Quest Software, Inc. All rights reserved..
  • 21. Export load 30000 SQOOP SQOOP with Quest connect 25000 20000 Database time (s) 15000 10000 5000 0 0 5 10 15 20 25 30 No of mappers 20 ©2011 Quest Software, Inc. All rights reserved..
  • 22. Working with the SQOOP framework • SQOOP lets you concentrate on the RDBMS logic, not the Hadoop plumbing: • Extend ManagerFactory (what to handle) • Extend ConnManager (DB connection and metadata) • For imports: • Extend DataDrivenDBInputFormat (gets the data) • Data allocation (getSplits()) • Split serialization (“io.serializations” property) • Data access logic (createDBRecordReader(), getSelectQuery()) • Implement progress (nextKeyValue(), getProgress()) • Similar procedure for extending exports 21 ©2011 Quest Software, Inc. All rights reserved..
  • 23. Extensions to native SQOOP • MERGE functionality • Update if exists, insert otherwise • Hive connector • Source defined as HQL query rather than HDFS directory • Eclipse UI 22 ©2011 Quest Software, Inc. All rights reserved..
  • 24. Availability • Apache licensed source available from : https://github.com/QuestSoftwareTCD/OracleSQOOPconnector • Download from (Quest): http://www.quest.com/hadoop/ • Download from (Cloudera): http://ccp.cloudera.com/display/SUPPORT/Downloads 23 ©2011 Quest Software, Inc. All rights reserved..
  • 25. Other SQOOP connectors • Microsoft SQL Server: • http://www.microsoft.com/download/en/details.aspx?id=27584 • Teradata: • https://ccp.cloudera.com/display/con/Cloudera+Connector+for+Teradata+User+Guide %2C+version+1.0-beta-u4 • Microstrategy: • https://ccp.cloudera.com/display/con/MicroStrategy+Free+Download+License+Agreem ent • Nettezza: • https://ccp.cloudera.com/display/con/Netezza+Free+Download+License+Agreement • VoltDB: • http://voltdb.com/company/blog/sqoop-voltdb-export-and-hadoop-integration 24 ©2011 Quest Software, Inc. All rights reserved..
  • 26. Other Hadoop – RDBMS integrations 25 ©2011 Quest Software, Inc. All rights reserved..
  • 27. Oracle Big Data Appliance • 18 Sun X4270 M2 servers • 48GB per node (864GB total) • 2x6 Core CPU per node (216 total) • 12x2TB HDD per node (216 spindles, 864 TB) • 40Gb/s Infiniband between nodes • 10Gb/s Ethernet to datacenter • Apache Hadoop • Oracle NoSQL • Oracle loader for Hadoop • Multi-stage C-optimized unidirectional loader www.oracle.com/us/bigdata/index.html 26 ©2011 Quest Software, Inc. All rights reserved..
  • 28. ORACLE ORACLE ORACLE BIG DATA EXALOGIC EXALYTICS APPLIANCE ORACLE WEBLOGIC ORACLE ORACLE NOSQL ESSBASE ORACLE ORACLE EXADATA LOADER FOR HADOOP APACHE ORACLE HADOOP ORACLE RDBMS TIMES TEN
  • 29. Microsoft 28 ©2011 Quest Software, Inc. All rights reserved..
  • 30. Hadapt • Formally HadoopDB – Hadoop/Postgres hybrid • Postgres servers on data nodes allow for accelerated (indexed) HIVE queries • Extensions to the Hive optimizer http://www.hadapt.com/ 29 ©2011 Quest Software, Inc. All rights reserved..
  • 31. Greenplum • SQL based access to HDFS data via in-DB MapReduce http://www.greenplum.com/sites/default/files/EMC_Greenplum_Hadoop_DB_TB_0.pdf 30 ©2011 Quest Software, Inc. All rights reserved..
  • 32. Toad for Cloud Databases • Federated SQL queries across Hive, Hbase, NoSQL, RDBMS 31 ©2011 Quest Software, Inc. All rights reserved..
  • 33. Conclusions • RDBMS-Hadoop interoperability is key to Enterprise Hadoop adoption • SQOOP provides a good general purpose framework for transferring data between any JDBC database and Hadoop • We’d like to see it become a standard • Each RDBMS offers distinct tuning opportunities, so optimized SQOOP extensions offer real value • Hadoop-RDBMS integration projects are proliferating rapidly 32 ©2011 Quest Software, Inc. All rights reserved..
  • 34. 33 ©2011 Quest Software, Inc. All rights reserved..

Editor's Notes

  • #3: Insanely popular – literally millions of users