SlideShare a Scribd company logo
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
Zohar Elkayam
www.realdbamagic.com
@realmgic
Things Every Oracle DBA
Needs to Know about the
Hadoop Ecosystem
Who am I?
• Zohar Elkayam, CTO at Brillix
• Programmer, DBA, team leader, database trainer, public
speaker, and a senior consultant for over 19 years
• Oracle ACE Associate
• Member of ilOUG – Israel Oracle User Group
• Involved with Big Data projects since 2011
• Blogger – www.realdbamagic.com and www.ilDBA.co.il
3 http://brillix.co.il
About Brillix
• We offer complete, integrated end-to-end solutions based on best-of-
breed innovations in database, security and big data technologies
• We provide complete end-to-end 24x7 expert remote database
services
• We offer professional customized on-site trainings, delivered by our
top-notch world recognized instructors
4
Some of Our Customers
http://brillix.co.il5
Agenda
• What is the Big Data challenge?
• A Big Data Solution: Apache Hadoop
• HDFS
• MapReduce and YARN
• Hadoop Ecosystem: HBase, Sqoop, Hive, Pig and other tools
• Another Big Data Solution: Apache Spark
• Where does the DBA fits in?
http://brillix.co.il6
The Challenge
7
The Big Data Challenge
http://brillix.co.il8
Volume
• Big data comes in one size: Big.
• Size is measured in Terabyte (1012), Petabyte (1015),
Exabyte (1018), Zettabyte (1021)
• The storing and handling of the data becomes an issue
• Producing value out of the data in a reasonable time is an
issue
http://brillix.co.il9
Variety
• Big Data extends beyond structured data, including semi-structured
and unstructured information: logs, text, audio and videos
• Wide variety of rapidly evolving data types requires highly flexible
stores and handling
http://brillix.co.il10
Un-Structured Structured
Objects Tables
Flexible Columns and Rows
Structure Unknown Predefined Structure
Textual and Binary Mostly Textual
Velocity
•The speed in which data is being generated and
collected
•Streaming data and large volume data movement
•High velocity of data capture – requires rapid
ingestion
•Might cause a backlog problem
http://brillix.co.il11
Value
Big data is not about the size of the data,
It’s about the value within the data
http://brillix.co.il12
So, We Define Big Data Problem…
• When the data is too big or moves too fast to handle in a
sensible amount of time
• When the data doesn’t fit any conventional database
structure
• When we think that we can still produce value from that
data and want to handle it
• When the technical solution to the business need becomes
part of the problem
http://brillix.co.il13
How to do Big Data
14
15
Big Data in Practice
•Big data is big: technological framework and
infrastructure solutions are needed
•Big data is complicated:
• We need developers to manage handling of the data
• We need devops to manage the clusters
• We need data analysts and data scientists to produce
value
http://brillix.co.il16
Possible Solutions: Scale Up
• Older solution: using a giant server with a lot of resources
(scale up: more cores, faster processers, more memory) to
handle the data
• Process everything on a single server with hundreds of CPU
cores
• Use lots of memory (1+ TB)
• Have a huge data store on high end storage solutions
• Data needs to be copied to the processes in real time, so
it’s no good for high amounts of data (Terabytes to
Petabytes)
http://brillix.co.il17
Another Solution: Distributed Systems
•A scale-out solution: let’s use distributed systems:
use multiple machine for a single job/application
•More machines means more resources
• CPU
• Memory
• Storage
•But the solution is still complicated: infrastructure
and frameworks are needed
http://brillix.co.il18
Distributed Infrastructure Challenges
• We need Infrastructure that is built for:
• Large-scale
• Linear scale out ability
• Data-intensive jobs that spread the problem across clusters of
server nodes
• Storage: efficient and cost-effective enough to capture and
store terabytes, if not petabytes, of data
• Network infrastructure that can quickly import large data
sets and then replicate it to various nodes for processing
• High-end hardware is too expensive - we need a solution
that uses cheaper hardware
http://brillix.co.il19
Distributed System/Frameworks Challenges
•How do we distribute workload across the system?
•Programming complexity – keeping the data in sync
•What to do with faults and redundancy?
•How do we handle security demands to protect
highly-distributed infrastructure and data?
http://brillix.co.il20
A Big Data Solution:
Apache Hadoop
21
Apache Hadoop
•Open source project run by Apache Foundation (2006)
•Hadoop brings the ability to cheaply process large
amounts of data, regardless of its structure
•It Is has been the driving force behind the growth of
the big data industry
•Get the public release from:
• http://hadoop.apache.org/core/
http://brillix.co.il22
Original Hadoop Components
•HDFS (Hadoop Distributed File System) – distributed
file system that runs in clustered environments
•MapReduce – programming paradigm for running
processes over clustered environments
•Hadoop main idea: let’s distribute the data to many
servers, and then bring the program to the data
http://brillix.co.il23
Hadoop Benefits
•Designed for scale out
•Reliable solution based on unreliable hardware
•Load data first, structure later
•Designed for storing large files
•Designed to maximize throughput of large scans
•Designed to leverage parallelism
•Solution Ecosystem
24 http://brillix.co.il
What Hadoop Is Not?
• Hadoop is not a database – it does not a replacement for
DW, or other relational databases
• Hadoop is not commonly used for OLTP/real-time systems
• Very good for large amounts, not so much for smaller sets
• Designed for clusters – there is no Hadoop monster server
(single server)
http://brillix.co.il25
Hadoop Limitations
•Hadoop is scalable but it’s not fast
•Some assembly may be required
•Batteries are not included (DIY mindset) – some
features needs to be developed if they’re not available
•Open source license limitations apply
•Technology is changing very rapidly
http://brillix.co.il26
Hadoop under the Hood
27
Original Hadoop 1.0 Components
• HDFS (Hadoop Distributed File System) – distributed file
system that runs in a clustered environment
• MapReduce – programming technique for running
processes over a clustered environment
28 http://brillix.co.il
Hadoop 2.0
• Hadoop 2.0 changed the Hadoop conception and
introduced a better resource management concept:
• Hadoop Common
• HDFS
• YARN
• Multiple data processing
frameworks including
MapReduce, Spark and
others
http://brillix.co.il29
HDFS is...
• A distributed file system
• Designed to reliably store data using commodity hardware
• Designed to expect hardware failures and still stay resilient
• Intended for larger files
• Designed for batch inserts and appending data (no updates)
http://brillix.co.il30
Files and Blocks
•Files are split into 128MB blocks (single unit of
storage)
• Managed by NameNode and stored on DataNodes
• Transparent to users
•Replicated across machines at load time
• Same block is stored on multiple machines
• Good for fault-tolerance and access
• Default replication factor is 3
31 http://brillix.co.il
HDFS Node Types
HDFS has three types of Nodes
• Datanodes
• Responsible for actual file store
• Serving data from files(data) to client
• Namenode (MasterNode)
• Distribute files in the cluster
• Responsible for the replication between
the datanodes and for file blocks
location
• BackupNode
• It’s a backup of the NameNode
http://brillix.co.il32
HDFS is Good for...
•Storing large files
• Terabytes, Petabytes, etc...
• Millions rather than billions of files
• 128MB or more per file
•Streaming data
• Write once and read-many times patterns
• Optimized for streaming reads rather than random reads
33 http://brillix.co.il
HDFS is Not So Good For...
• Low-latency reads / Real-time application
• High-throughput rather than low latency for small chunks of
data
• HBase addresses this issue
• Large amount of small files
• Better for millions of large files instead of billions of small files
• Multiple Writers
• Single writer per file
• Writes at the end of files, no-support for arbitrary offset
34 http://brillix.co.il
Using HDFS in Command Line
http://brillix.co.il35
How Does HDFS Look Like (GUI)
http://brillix.co.il36
Interfacing with HDFS
http://brillix.co.il37
MapReduce is...
• A programming model for expressing distributed
computations at a massive scale
• An execution framework for organizing and performing
such computations
• MapReduce can be written in Java, Scala, C, Python, Ruby
and others
• Concept: Bring the code to the data, not the data to the
code
http://brillix.co.il38
The MapReduce Paradigm
• Imposes key-value input/output
• We implement two main functions:
• MAP - Takes a large problem and divides into sub problems and
performs the same function on all sub-problems
Map(k1, v1) -> list(k2, v2)
• REDUCE - Combine the output from all sub-problems (each key goes to
the same reducer)
Reduce(k2, list(v2)) -> list(v3)
• Framework handles everything else (almost)
39 http://brillix.co.il
MapReduce Word Count Process
40
YARN
•Takes care of distributed processing and coordination
•Scheduling
• Jobs are broken down into smaller chunks called tasks
• These tasks are scheduled to run on data nodes
•Task Localization with Data
• Framework strives to place tasks on the nodes that host
the segment of data to be processed by that specific task
• Code is moved to where the data is
41 http://brillix.co.il
YARN
•Error Handling
• Failures are an expected behavior so tasks are
automatically re-tried on other machines
•Data Synchronization
• Shuffle and Sort barrier re-arranges and moves data
between machines
• Input and output are coordinated by the framework
42 http://brillix.co.il
YARN Framework Support
•With YARN, we can go beyond the Hadoop ecosystem
•Support different frameworks:
• MapReduce v2
• Spark
• Giraph
• Co-Processors for Apache HBase
• More…
http://brillix.co.il43
Submitting a Job
•Yarn script with a class argument command launches
a JVM and executes the provided Job
$ yarn jar HadoopSamples.jar mr.wordcount.StartsWithCountJob 
/user/sample/hamlet.txt 
/user/sample/wordcount/
http://brillix.co.il44
Resource Manage: UI
http://brillix.co.il45
Application View
http://brillix.co.il46
Hadoop Main Problems
• Hadoop MapReduce Framework (not MapReduce
paradigm) had some major problems:
• Developing MapReduce was complicated – there was more
than just business logics to develop
• Transferring data between stages requires the intermediate
data to be written to disk (and then read by the next step)
• Multi-step needed orchestration and abstraction solutions
• Initial resource management was very painful – MapReduce
framework was based on resource slots
http://brillix.co.il47
Extending Hadoop
The Hadoop Ecosystem
Improving Hadoop: Distributions
• Core Hadoop is complicated so some tools and solution
frameworks were added to make things easier
• There are over 80 different Apache projects for big data
solution which uses Hadoop (and growing!)
• Hadoop Distributions collects some of these tools and
release them as a complete integrated package
• Good for integration – but still open source, we pay for
support and integration
49
Noticeable Distributions (on-premise)
•Cloudera
•Hortenworks
•MapR
•IBM InfoSphere
http://brillix.co.il50
Noticeable Distributions (cloud)
•Amazon Web Services (S3 + EC2)
•Microsoft Azure
•Google Cloud Platform
•IBM InfoSphere BigInsights
http://brillix.co.il51
Common HADOOP 2.0 Technology Eco System
52 http://brillix.co.il
Improving Programmability
•MapReduce code in Java is sometime tedious, so
different solutions came to the rescue
• Pig: Programming language that simplifies Hadoop
actions: loading, transforming and sorting data
• Hive: enables Hadoop to operate as data warehouse using
SQL-like syntax
• Spark and other frameworks
http://brillix.co.il53
Pig
• Pig is an abstraction on top of Hadoop
• Provides high level programming language designed for data
processing
• Scripts converted into MapReduce code, and executed on the
Hadoop Clusters
• Makes ETL/ELT processing and other simple MapReduce
easier without writing MapReduce code
• Pig was widely accepted and used by Yahoo!, Twitter, Netflix,
and others
• Often replaced by more up-to-date tools like Apache Spark
http://brillix.co.il54
Hive
• Data Warehousing Solution built on top of Hadoop
• Provides SQL-like query language named HiveQL
• Minimal learning curve for people with SQL expertise
• Data analysts are target audience
• Early Hive development work started at Facebook in 2007
• Hive is an Apache top level project under
Hadoop
• http://hive.apache.org
http://brillix.co.il55
Hive Provides
•Ability to bring structure to various data formats
•Simple interface for ad hoc querying, analyzing and
summarizing large amounts of data
•Access to files on various data stores such as HDFS
and HBase
•Also see: Apache Impala (mainly in Cloudera)
56
Databases and DB Connectivity
•HBase: Online NoSQL Key/Value wide-column
oriented datastore that is native to HDFS
•Sqoop: a tool designed to import data from and export
data to relational databases (HDFS, Hbase, or Hive)
•Sqoop2: Sqoop centralized service (GUI, WebUI,
REST)
57
HBase
• HBase is the closest thing we had to
database in the early Hadoop days
• Distributed key/value with wide-column oriented NoSQL
database, built on top of HDFS
• Providing Big Table-like capabilities
• Does not have a query language: only get, put, and scan
commands
• Often compared with Cassandra
(non-Hadoop native Apache project)
and Aerospike
http://brillix.co.il58
When Do We Use HBase?
•Huge volumes of randomly accessed data
•HBase is at its best when it’s accessed in a
distributed fashion by many clients (high
consistency)
•Consider HBase when we are loading data by key,
searching data by key (or range), serving data by key,
querying data by key or when storing data by row that
doesn’t conform well to a schema.
59
When NOT To Use HBase
•HBase doesn’t use SQL, don’t have an optimizer,
doesn’t support transactions or joins
•HBase doesn’t have data types
•See project Apache Phoenix for better data structure
and query language when using HBase
60
Sqoop and Sqoop2
• Sqoop is a command line tool for moving data
from RDBMS to Hadoop. Sqoop2 is a centralized
tool for running sqoop.
• Uses MapReduce load the data from relational database to HDFS
• Can also export data from HBase to RDBMS
• Comes with connectors to MySQL, PostgreSQL, Oracle, SQL
Server and DB2.
$bin/sqoop import --connect
'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' 
--table lineitem --hive-import
$bin/sqoop export --connect
'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' 
--table lineitem --export-dir /data/lineitemData
http://brillix.co.il61
Improving Hadoop – More Useful Tools
•For improving coordination: Zookeeper
•For improving scheduling/orchestration: Oozie
•Data Storing in memory: Apache Impala
•For Improving log collection: Flume
•Text Search and Data Discovery: Solr
•For Improving UI and Dashboards: Hue and Ambari
http://brillix.co.il62
Improving Hadoop – More Useful Tools (2)
•Data serialization: Avro (rows) and Parquet (columns)
•Data governance: Atlas
•Security: Knox and Ranger
•Data Replication: Falcon
•Machine Learning: Mahout
•Performance Improvement: Tez
•And there are more…
http://brillix.co.il63
64
Is Hadoop the Only Big Data Solution?
• No – there are other solutions:
• Apache Spark and Apache Mesos frameworks
• NoSQL systems (Apache Cassandra, CouchBase, MongoDB
and many others)
• Stream analysis (Apache Kafka, Apache Storm, Apache Flink)
• Machine learning (Apache Mahout, Spark MLlib)
• Some can be integrated with Hadoop, but some are
independent
http://brillix.co.il65
Another Big Data Solution: Apache Spark
•Apache Spark is a fast, general engine for
large-scale data processing on a cluster
•Originally developed by UC Berkeley in 2009 as a
research project, and is now an open source Apache
top level project
•Main idea: use the memory resources of the cluster
for better performance
•It is now one of the most fast-growing project today
http://brillix.co.il66
The Spark Stack
http://brillix.co.il67
Spark and Hadoop
• Spark and Hadoop are built to co-exist – it interacts with
the Hadoop ecosystem: Flume, Sqoop, Hbase, Hive
• Spark can use other storage systems (S3, local disks, NFS)
but works best when combined with HDFS
• Spark can use YARN for running jobs
• Spark can also interact with tools outside the Hadoop
ecosystem: Kafka, NoSQL (Cassandra, Aerospike, etc.)
Relational databases, and more
68
Okay, So Where Does the DBA Fits In?
• Big Data solutions are not databases. Databases are
probably not going to disappear, but we feel the change
even today: DBA’s must be ready for the change
• DBA’s are the perfect candidates to transition into Big Data
Experts:
• Have system (OS, disk, memory, hardware) experience
• Can understand data easily
• DBA’s are used to work with developers and other data users
http://brillix.co.il69
What DBAs Needs Now?
•DBA’s will need to know more programming: Java,
Scala, Python, R or any other popular language in the
Big Data world will do
•DBA’s needs to understand the position shifts, and
the introduction of DevOps, Data Scientists, CDO etc.
•Big Data is changing daily: we need to learn, read, and
be involved before we are left behind…
http://brillix.co.il70
Summary
• Big Data is here – it’s complicated and RDBMS does not fit
anymore
• Big Data solutions are evolving Hadoop is an example for
such a solution
• Spark is very popular Big Data solution
• DBA’s need to be ready for the change: Big Data solutions
are not databases and we make ourselves ready
http://brillix.co.il71
Q&A
http://brillix.co.il72
Thank You
and don’t forget to evaluate!
Zohar Elkayam
twitter: @realmgic
Zohar@Brillix.co.il
www.realdbamagic.com
http://brillix.co.il73
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
Ad

More Related Content

What's hot (20)

2019 - OOW - Database Migration Methods from On-Premise to Cloud
2019 - OOW - Database Migration Methods from On-Premise to Cloud2019 - OOW - Database Migration Methods from On-Premise to Cloud
2019 - OOW - Database Migration Methods from On-Premise to Cloud
Marcus Vinicius Miguel Pedro
 
Exploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12cExploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12c
Zohar Elkayam
 
2019 - GUOB Tech Day / Groundbreakers LAD Tour - Database Migration Methods t...
2019 - GUOB Tech Day / Groundbreakers LAD Tour - Database Migration Methods t...2019 - GUOB Tech Day / Groundbreakers LAD Tour - Database Migration Methods t...
2019 - GUOB Tech Day / Groundbreakers LAD Tour - Database Migration Methods t...
Marcus Vinicius Miguel Pedro
 
Adding real time reporting to your database oracle db in memory
Adding real time reporting to your database oracle db in memoryAdding real time reporting to your database oracle db in memory
Adding real time reporting to your database oracle db in memory
Zohar Elkayam
 
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAATemporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Cuneyt Goksu
 
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
Lucas Jellema
 
A3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloudA3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloud
Dr. Wilfred Lin (Ph.D.)
 
SQL Server on Linux - march 2017
SQL Server on Linux - march 2017SQL Server on Linux - march 2017
SQL Server on Linux - march 2017
Sorin Peste
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
Mike Broberg
 
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document StoreConnector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Filipe Silva
 
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Cloudera, Inc.
 
Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...
Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...
Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...
Lucas Jellema
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Jason Strate
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
Raul Chong
 
Cloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa NeddamCloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa Neddam
Romeo Kienzler
 
DBaaS with EDB Postgres on AWS
DBaaS with EDB Postgres on AWSDBaaS with EDB Postgres on AWS
DBaaS with EDB Postgres on AWS
EDB
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Lucas Jellema
 
Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2
Ajay Kumar Uppal
 
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
Lucas Jellema
 
MySQL 5.7 New Features for Developers
MySQL 5.7 New Features for DevelopersMySQL 5.7 New Features for Developers
MySQL 5.7 New Features for Developers
Zohar Elkayam
 
2019 - OOW - Database Migration Methods from On-Premise to Cloud
2019 - OOW - Database Migration Methods from On-Premise to Cloud2019 - OOW - Database Migration Methods from On-Premise to Cloud
2019 - OOW - Database Migration Methods from On-Premise to Cloud
Marcus Vinicius Miguel Pedro
 
Exploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12cExploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12c
Zohar Elkayam
 
2019 - GUOB Tech Day / Groundbreakers LAD Tour - Database Migration Methods t...
2019 - GUOB Tech Day / Groundbreakers LAD Tour - Database Migration Methods t...2019 - GUOB Tech Day / Groundbreakers LAD Tour - Database Migration Methods t...
2019 - GUOB Tech Day / Groundbreakers LAD Tour - Database Migration Methods t...
Marcus Vinicius Miguel Pedro
 
Adding real time reporting to your database oracle db in memory
Adding real time reporting to your database oracle db in memoryAdding real time reporting to your database oracle db in memory
Adding real time reporting to your database oracle db in memory
Zohar Elkayam
 
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAATemporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Cuneyt Goksu
 
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
Lucas Jellema
 
A3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloudA3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloud
Dr. Wilfred Lin (Ph.D.)
 
SQL Server on Linux - march 2017
SQL Server on Linux - march 2017SQL Server on Linux - march 2017
SQL Server on Linux - march 2017
Sorin Peste
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
Mike Broberg
 
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document StoreConnector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Filipe Silva
 
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Cloudera, Inc.
 
Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...
Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...
Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...
Lucas Jellema
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Jason Strate
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
Raul Chong
 
Cloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa NeddamCloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa Neddam
Romeo Kienzler
 
DBaaS with EDB Postgres on AWS
DBaaS with EDB Postgres on AWSDBaaS with EDB Postgres on AWS
DBaaS with EDB Postgres on AWS
EDB
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Lucas Jellema
 
Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2
Ajay Kumar Uppal
 
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
Lucas Jellema
 
MySQL 5.7 New Features for Developers
MySQL 5.7 New Features for DevelopersMySQL 5.7 New Features for Developers
MySQL 5.7 New Features for Developers
Zohar Elkayam
 

Similar to Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527 (20)

Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016
Zohar Elkayam
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
Big data applications
Big data applicationsBig data applications
Big data applications
Juan Pablo Paz Grau, Ph.D., PMP
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
arslanhaneef
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
sonukumar379092
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
DataWorks Summit
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
Learntek1
 
002 Introduction to hadoop v3
002   Introduction to hadoop v3002   Introduction to hadoop v3
002 Introduction to hadoop v3
Dendej Sawarnkatat
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Prashanth Yennampelli
 
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Hadoop Eco system
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
Cloudera, Inc.
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Venneladonthireddy1
 
MODULE 1: Introduction to Big Data Analytics.pptx
MODULE 1: Introduction to Big Data Analytics.pptxMODULE 1: Introduction to Big Data Analytics.pptx
MODULE 1: Introduction to Big Data Analytics.pptx
NiramayKolalle
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
Kunal Khanna
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
KMS Technology
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016
Zohar Elkayam
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
Learntek1
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
Cloudera, Inc.
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Venneladonthireddy1
 
MODULE 1: Introduction to Big Data Analytics.pptx
MODULE 1: Introduction to Big Data Analytics.pptxMODULE 1: Introduction to Big Data Analytics.pptx
MODULE 1: Introduction to Big Data Analytics.pptx
NiramayKolalle
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
Kunal Khanna
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
KMS Technology
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Ad

More from Zohar Elkayam (17)

Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Zohar Elkayam
 
PL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme PerformancePL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme Performance
Zohar Elkayam
 
The art of querying – newest and advanced SQL techniques
The art of querying – newest and advanced SQL techniquesThe art of querying – newest and advanced SQL techniques
The art of querying – newest and advanced SQL techniques
Zohar Elkayam
 
Oracle Advanced SQL and Analytic Functions
Oracle Advanced SQL and Analytic FunctionsOracle Advanced SQL and Analytic Functions
Oracle Advanced SQL and Analytic Functions
Zohar Elkayam
 
Oracle 12c New Features For Better Performance
Oracle 12c New Features For Better PerformanceOracle 12c New Features For Better Performance
Oracle 12c New Features For Better Performance
Zohar Elkayam
 
Advanced PL/SQL Optimizing for Better Performance 2016
Advanced PL/SQL Optimizing for Better Performance 2016Advanced PL/SQL Optimizing for Better Performance 2016
Advanced PL/SQL Optimizing for Better Performance 2016
Zohar Elkayam
 
Oracle Database Advanced Querying (2016)
Oracle Database Advanced Querying (2016)Oracle Database Advanced Querying (2016)
Oracle Database Advanced Querying (2016)
Zohar Elkayam
 
OOW2016: Exploring Advanced SQL Techniques Using Analytic Functions
OOW2016: Exploring Advanced SQL Techniques Using Analytic FunctionsOOW2016: Exploring Advanced SQL Techniques Using Analytic Functions
OOW2016: Exploring Advanced SQL Techniques Using Analytic Functions
Zohar Elkayam
 
Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?
Zohar Elkayam
 
Exploring Advanced SQL Techniques Using Analytic Functions
Exploring Advanced SQL Techniques Using Analytic FunctionsExploring Advanced SQL Techniques Using Analytic Functions
Exploring Advanced SQL Techniques Using Analytic Functions
Zohar Elkayam
 
Exploring Advanced SQL Techniques Using Analytic Functions
Exploring Advanced SQL Techniques Using Analytic FunctionsExploring Advanced SQL Techniques Using Analytic Functions
Exploring Advanced SQL Techniques Using Analytic Functions
Zohar Elkayam
 
Advanced PLSQL Optimizing for Better Performance
Advanced PLSQL Optimizing for Better PerformanceAdvanced PLSQL Optimizing for Better Performance
Advanced PLSQL Optimizing for Better Performance
Zohar Elkayam
 
Oracle Database Advanced Querying
Oracle Database Advanced QueryingOracle Database Advanced Querying
Oracle Database Advanced Querying
Zohar Elkayam
 
SQLcl the next generation of SQLPlus?
SQLcl the next generation of SQLPlus?SQLcl the next generation of SQLPlus?
SQLcl the next generation of SQLPlus?
Zohar Elkayam
 
Oracle Data Guard A to Z
Oracle Data Guard A to ZOracle Data Guard A to Z
Oracle Data Guard A to Z
Zohar Elkayam
 
Oracle Data Guard Broker Webinar
Oracle Data Guard Broker WebinarOracle Data Guard Broker Webinar
Oracle Data Guard Broker Webinar
Zohar Elkayam
 
Oracle Database In-Memory Option for ILOUG
Oracle Database In-Memory Option for ILOUGOracle Database In-Memory Option for ILOUG
Oracle Database In-Memory Option for ILOUG
Zohar Elkayam
 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Zohar Elkayam
 
PL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme PerformancePL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme Performance
Zohar Elkayam
 
The art of querying – newest and advanced SQL techniques
The art of querying – newest and advanced SQL techniquesThe art of querying – newest and advanced SQL techniques
The art of querying – newest and advanced SQL techniques
Zohar Elkayam
 
Oracle Advanced SQL and Analytic Functions
Oracle Advanced SQL and Analytic FunctionsOracle Advanced SQL and Analytic Functions
Oracle Advanced SQL and Analytic Functions
Zohar Elkayam
 
Oracle 12c New Features For Better Performance
Oracle 12c New Features For Better PerformanceOracle 12c New Features For Better Performance
Oracle 12c New Features For Better Performance
Zohar Elkayam
 
Advanced PL/SQL Optimizing for Better Performance 2016
Advanced PL/SQL Optimizing for Better Performance 2016Advanced PL/SQL Optimizing for Better Performance 2016
Advanced PL/SQL Optimizing for Better Performance 2016
Zohar Elkayam
 
Oracle Database Advanced Querying (2016)
Oracle Database Advanced Querying (2016)Oracle Database Advanced Querying (2016)
Oracle Database Advanced Querying (2016)
Zohar Elkayam
 
OOW2016: Exploring Advanced SQL Techniques Using Analytic Functions
OOW2016: Exploring Advanced SQL Techniques Using Analytic FunctionsOOW2016: Exploring Advanced SQL Techniques Using Analytic Functions
OOW2016: Exploring Advanced SQL Techniques Using Analytic Functions
Zohar Elkayam
 
Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?
Zohar Elkayam
 
Exploring Advanced SQL Techniques Using Analytic Functions
Exploring Advanced SQL Techniques Using Analytic FunctionsExploring Advanced SQL Techniques Using Analytic Functions
Exploring Advanced SQL Techniques Using Analytic Functions
Zohar Elkayam
 
Exploring Advanced SQL Techniques Using Analytic Functions
Exploring Advanced SQL Techniques Using Analytic FunctionsExploring Advanced SQL Techniques Using Analytic Functions
Exploring Advanced SQL Techniques Using Analytic Functions
Zohar Elkayam
 
Advanced PLSQL Optimizing for Better Performance
Advanced PLSQL Optimizing for Better PerformanceAdvanced PLSQL Optimizing for Better Performance
Advanced PLSQL Optimizing for Better Performance
Zohar Elkayam
 
Oracle Database Advanced Querying
Oracle Database Advanced QueryingOracle Database Advanced Querying
Oracle Database Advanced Querying
Zohar Elkayam
 
SQLcl the next generation of SQLPlus?
SQLcl the next generation of SQLPlus?SQLcl the next generation of SQLPlus?
SQLcl the next generation of SQLPlus?
Zohar Elkayam
 
Oracle Data Guard A to Z
Oracle Data Guard A to ZOracle Data Guard A to Z
Oracle Data Guard A to Z
Zohar Elkayam
 
Oracle Data Guard Broker Webinar
Oracle Data Guard Broker WebinarOracle Data Guard Broker Webinar
Oracle Data Guard Broker Webinar
Zohar Elkayam
 
Oracle Database In-Memory Option for ILOUG
Oracle Database In-Memory Option for ILOUGOracle Database In-Memory Option for ILOUG
Oracle Database In-Memory Option for ILOUG
Zohar Elkayam
 
Ad

Recently uploaded (20)

Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 

Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527

  • 2. Zohar Elkayam www.realdbamagic.com @realmgic Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
  • 3. Who am I? • Zohar Elkayam, CTO at Brillix • Programmer, DBA, team leader, database trainer, public speaker, and a senior consultant for over 19 years • Oracle ACE Associate • Member of ilOUG – Israel Oracle User Group • Involved with Big Data projects since 2011 • Blogger – www.realdbamagic.com and www.ilDBA.co.il 3 http://brillix.co.il
  • 4. About Brillix • We offer complete, integrated end-to-end solutions based on best-of- breed innovations in database, security and big data technologies • We provide complete end-to-end 24x7 expert remote database services • We offer professional customized on-site trainings, delivered by our top-notch world recognized instructors 4
  • 5. Some of Our Customers http://brillix.co.il5
  • 6. Agenda • What is the Big Data challenge? • A Big Data Solution: Apache Hadoop • HDFS • MapReduce and YARN • Hadoop Ecosystem: HBase, Sqoop, Hive, Pig and other tools • Another Big Data Solution: Apache Spark • Where does the DBA fits in? http://brillix.co.il6
  • 8. The Big Data Challenge http://brillix.co.il8
  • 9. Volume • Big data comes in one size: Big. • Size is measured in Terabyte (1012), Petabyte (1015), Exabyte (1018), Zettabyte (1021) • The storing and handling of the data becomes an issue • Producing value out of the data in a reasonable time is an issue http://brillix.co.il9
  • 10. Variety • Big Data extends beyond structured data, including semi-structured and unstructured information: logs, text, audio and videos • Wide variety of rapidly evolving data types requires highly flexible stores and handling http://brillix.co.il10 Un-Structured Structured Objects Tables Flexible Columns and Rows Structure Unknown Predefined Structure Textual and Binary Mostly Textual
  • 11. Velocity •The speed in which data is being generated and collected •Streaming data and large volume data movement •High velocity of data capture – requires rapid ingestion •Might cause a backlog problem http://brillix.co.il11
  • 12. Value Big data is not about the size of the data, It’s about the value within the data http://brillix.co.il12
  • 13. So, We Define Big Data Problem… • When the data is too big or moves too fast to handle in a sensible amount of time • When the data doesn’t fit any conventional database structure • When we think that we can still produce value from that data and want to handle it • When the technical solution to the business need becomes part of the problem http://brillix.co.il13
  • 14. How to do Big Data 14
  • 15. 15
  • 16. Big Data in Practice •Big data is big: technological framework and infrastructure solutions are needed •Big data is complicated: • We need developers to manage handling of the data • We need devops to manage the clusters • We need data analysts and data scientists to produce value http://brillix.co.il16
  • 17. Possible Solutions: Scale Up • Older solution: using a giant server with a lot of resources (scale up: more cores, faster processers, more memory) to handle the data • Process everything on a single server with hundreds of CPU cores • Use lots of memory (1+ TB) • Have a huge data store on high end storage solutions • Data needs to be copied to the processes in real time, so it’s no good for high amounts of data (Terabytes to Petabytes) http://brillix.co.il17
  • 18. Another Solution: Distributed Systems •A scale-out solution: let’s use distributed systems: use multiple machine for a single job/application •More machines means more resources • CPU • Memory • Storage •But the solution is still complicated: infrastructure and frameworks are needed http://brillix.co.il18
  • 19. Distributed Infrastructure Challenges • We need Infrastructure that is built for: • Large-scale • Linear scale out ability • Data-intensive jobs that spread the problem across clusters of server nodes • Storage: efficient and cost-effective enough to capture and store terabytes, if not petabytes, of data • Network infrastructure that can quickly import large data sets and then replicate it to various nodes for processing • High-end hardware is too expensive - we need a solution that uses cheaper hardware http://brillix.co.il19
  • 20. Distributed System/Frameworks Challenges •How do we distribute workload across the system? •Programming complexity – keeping the data in sync •What to do with faults and redundancy? •How do we handle security demands to protect highly-distributed infrastructure and data? http://brillix.co.il20
  • 21. A Big Data Solution: Apache Hadoop 21
  • 22. Apache Hadoop •Open source project run by Apache Foundation (2006) •Hadoop brings the ability to cheaply process large amounts of data, regardless of its structure •It Is has been the driving force behind the growth of the big data industry •Get the public release from: • http://hadoop.apache.org/core/ http://brillix.co.il22
  • 23. Original Hadoop Components •HDFS (Hadoop Distributed File System) – distributed file system that runs in clustered environments •MapReduce – programming paradigm for running processes over clustered environments •Hadoop main idea: let’s distribute the data to many servers, and then bring the program to the data http://brillix.co.il23
  • 24. Hadoop Benefits •Designed for scale out •Reliable solution based on unreliable hardware •Load data first, structure later •Designed for storing large files •Designed to maximize throughput of large scans •Designed to leverage parallelism •Solution Ecosystem 24 http://brillix.co.il
  • 25. What Hadoop Is Not? • Hadoop is not a database – it does not a replacement for DW, or other relational databases • Hadoop is not commonly used for OLTP/real-time systems • Very good for large amounts, not so much for smaller sets • Designed for clusters – there is no Hadoop monster server (single server) http://brillix.co.il25
  • 26. Hadoop Limitations •Hadoop is scalable but it’s not fast •Some assembly may be required •Batteries are not included (DIY mindset) – some features needs to be developed if they’re not available •Open source license limitations apply •Technology is changing very rapidly http://brillix.co.il26
  • 27. Hadoop under the Hood 27
  • 28. Original Hadoop 1.0 Components • HDFS (Hadoop Distributed File System) – distributed file system that runs in a clustered environment • MapReduce – programming technique for running processes over a clustered environment 28 http://brillix.co.il
  • 29. Hadoop 2.0 • Hadoop 2.0 changed the Hadoop conception and introduced a better resource management concept: • Hadoop Common • HDFS • YARN • Multiple data processing frameworks including MapReduce, Spark and others http://brillix.co.il29
  • 30. HDFS is... • A distributed file system • Designed to reliably store data using commodity hardware • Designed to expect hardware failures and still stay resilient • Intended for larger files • Designed for batch inserts and appending data (no updates) http://brillix.co.il30
  • 31. Files and Blocks •Files are split into 128MB blocks (single unit of storage) • Managed by NameNode and stored on DataNodes • Transparent to users •Replicated across machines at load time • Same block is stored on multiple machines • Good for fault-tolerance and access • Default replication factor is 3 31 http://brillix.co.il
  • 32. HDFS Node Types HDFS has three types of Nodes • Datanodes • Responsible for actual file store • Serving data from files(data) to client • Namenode (MasterNode) • Distribute files in the cluster • Responsible for the replication between the datanodes and for file blocks location • BackupNode • It’s a backup of the NameNode http://brillix.co.il32
  • 33. HDFS is Good for... •Storing large files • Terabytes, Petabytes, etc... • Millions rather than billions of files • 128MB or more per file •Streaming data • Write once and read-many times patterns • Optimized for streaming reads rather than random reads 33 http://brillix.co.il
  • 34. HDFS is Not So Good For... • Low-latency reads / Real-time application • High-throughput rather than low latency for small chunks of data • HBase addresses this issue • Large amount of small files • Better for millions of large files instead of billions of small files • Multiple Writers • Single writer per file • Writes at the end of files, no-support for arbitrary offset 34 http://brillix.co.il
  • 35. Using HDFS in Command Line http://brillix.co.il35
  • 36. How Does HDFS Look Like (GUI) http://brillix.co.il36
  • 38. MapReduce is... • A programming model for expressing distributed computations at a massive scale • An execution framework for organizing and performing such computations • MapReduce can be written in Java, Scala, C, Python, Ruby and others • Concept: Bring the code to the data, not the data to the code http://brillix.co.il38
  • 39. The MapReduce Paradigm • Imposes key-value input/output • We implement two main functions: • MAP - Takes a large problem and divides into sub problems and performs the same function on all sub-problems Map(k1, v1) -> list(k2, v2) • REDUCE - Combine the output from all sub-problems (each key goes to the same reducer) Reduce(k2, list(v2)) -> list(v3) • Framework handles everything else (almost) 39 http://brillix.co.il
  • 40. MapReduce Word Count Process 40
  • 41. YARN •Takes care of distributed processing and coordination •Scheduling • Jobs are broken down into smaller chunks called tasks • These tasks are scheduled to run on data nodes •Task Localization with Data • Framework strives to place tasks on the nodes that host the segment of data to be processed by that specific task • Code is moved to where the data is 41 http://brillix.co.il
  • 42. YARN •Error Handling • Failures are an expected behavior so tasks are automatically re-tried on other machines •Data Synchronization • Shuffle and Sort barrier re-arranges and moves data between machines • Input and output are coordinated by the framework 42 http://brillix.co.il
  • 43. YARN Framework Support •With YARN, we can go beyond the Hadoop ecosystem •Support different frameworks: • MapReduce v2 • Spark • Giraph • Co-Processors for Apache HBase • More… http://brillix.co.il43
  • 44. Submitting a Job •Yarn script with a class argument command launches a JVM and executes the provided Job $ yarn jar HadoopSamples.jar mr.wordcount.StartsWithCountJob /user/sample/hamlet.txt /user/sample/wordcount/ http://brillix.co.il44
  • 47. Hadoop Main Problems • Hadoop MapReduce Framework (not MapReduce paradigm) had some major problems: • Developing MapReduce was complicated – there was more than just business logics to develop • Transferring data between stages requires the intermediate data to be written to disk (and then read by the next step) • Multi-step needed orchestration and abstraction solutions • Initial resource management was very painful – MapReduce framework was based on resource slots http://brillix.co.il47
  • 49. Improving Hadoop: Distributions • Core Hadoop is complicated so some tools and solution frameworks were added to make things easier • There are over 80 different Apache projects for big data solution which uses Hadoop (and growing!) • Hadoop Distributions collects some of these tools and release them as a complete integrated package • Good for integration – but still open source, we pay for support and integration 49
  • 50. Noticeable Distributions (on-premise) •Cloudera •Hortenworks •MapR •IBM InfoSphere http://brillix.co.il50
  • 51. Noticeable Distributions (cloud) •Amazon Web Services (S3 + EC2) •Microsoft Azure •Google Cloud Platform •IBM InfoSphere BigInsights http://brillix.co.il51
  • 52. Common HADOOP 2.0 Technology Eco System 52 http://brillix.co.il
  • 53. Improving Programmability •MapReduce code in Java is sometime tedious, so different solutions came to the rescue • Pig: Programming language that simplifies Hadoop actions: loading, transforming and sorting data • Hive: enables Hadoop to operate as data warehouse using SQL-like syntax • Spark and other frameworks http://brillix.co.il53
  • 54. Pig • Pig is an abstraction on top of Hadoop • Provides high level programming language designed for data processing • Scripts converted into MapReduce code, and executed on the Hadoop Clusters • Makes ETL/ELT processing and other simple MapReduce easier without writing MapReduce code • Pig was widely accepted and used by Yahoo!, Twitter, Netflix, and others • Often replaced by more up-to-date tools like Apache Spark http://brillix.co.il54
  • 55. Hive • Data Warehousing Solution built on top of Hadoop • Provides SQL-like query language named HiveQL • Minimal learning curve for people with SQL expertise • Data analysts are target audience • Early Hive development work started at Facebook in 2007 • Hive is an Apache top level project under Hadoop • http://hive.apache.org http://brillix.co.il55
  • 56. Hive Provides •Ability to bring structure to various data formats •Simple interface for ad hoc querying, analyzing and summarizing large amounts of data •Access to files on various data stores such as HDFS and HBase •Also see: Apache Impala (mainly in Cloudera) 56
  • 57. Databases and DB Connectivity •HBase: Online NoSQL Key/Value wide-column oriented datastore that is native to HDFS •Sqoop: a tool designed to import data from and export data to relational databases (HDFS, Hbase, or Hive) •Sqoop2: Sqoop centralized service (GUI, WebUI, REST) 57
  • 58. HBase • HBase is the closest thing we had to database in the early Hadoop days • Distributed key/value with wide-column oriented NoSQL database, built on top of HDFS • Providing Big Table-like capabilities • Does not have a query language: only get, put, and scan commands • Often compared with Cassandra (non-Hadoop native Apache project) and Aerospike http://brillix.co.il58
  • 59. When Do We Use HBase? •Huge volumes of randomly accessed data •HBase is at its best when it’s accessed in a distributed fashion by many clients (high consistency) •Consider HBase when we are loading data by key, searching data by key (or range), serving data by key, querying data by key or when storing data by row that doesn’t conform well to a schema. 59
  • 60. When NOT To Use HBase •HBase doesn’t use SQL, don’t have an optimizer, doesn’t support transactions or joins •HBase doesn’t have data types •See project Apache Phoenix for better data structure and query language when using HBase 60
  • 61. Sqoop and Sqoop2 • Sqoop is a command line tool for moving data from RDBMS to Hadoop. Sqoop2 is a centralized tool for running sqoop. • Uses MapReduce load the data from relational database to HDFS • Can also export data from HBase to RDBMS • Comes with connectors to MySQL, PostgreSQL, Oracle, SQL Server and DB2. $bin/sqoop import --connect 'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' --table lineitem --hive-import $bin/sqoop export --connect 'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' --table lineitem --export-dir /data/lineitemData http://brillix.co.il61
  • 62. Improving Hadoop – More Useful Tools •For improving coordination: Zookeeper •For improving scheduling/orchestration: Oozie •Data Storing in memory: Apache Impala •For Improving log collection: Flume •Text Search and Data Discovery: Solr •For Improving UI and Dashboards: Hue and Ambari http://brillix.co.il62
  • 63. Improving Hadoop – More Useful Tools (2) •Data serialization: Avro (rows) and Parquet (columns) •Data governance: Atlas •Security: Knox and Ranger •Data Replication: Falcon •Machine Learning: Mahout •Performance Improvement: Tez •And there are more… http://brillix.co.il63
  • 64. 64
  • 65. Is Hadoop the Only Big Data Solution? • No – there are other solutions: • Apache Spark and Apache Mesos frameworks • NoSQL systems (Apache Cassandra, CouchBase, MongoDB and many others) • Stream analysis (Apache Kafka, Apache Storm, Apache Flink) • Machine learning (Apache Mahout, Spark MLlib) • Some can be integrated with Hadoop, but some are independent http://brillix.co.il65
  • 66. Another Big Data Solution: Apache Spark •Apache Spark is a fast, general engine for large-scale data processing on a cluster •Originally developed by UC Berkeley in 2009 as a research project, and is now an open source Apache top level project •Main idea: use the memory resources of the cluster for better performance •It is now one of the most fast-growing project today http://brillix.co.il66
  • 68. Spark and Hadoop • Spark and Hadoop are built to co-exist – it interacts with the Hadoop ecosystem: Flume, Sqoop, Hbase, Hive • Spark can use other storage systems (S3, local disks, NFS) but works best when combined with HDFS • Spark can use YARN for running jobs • Spark can also interact with tools outside the Hadoop ecosystem: Kafka, NoSQL (Cassandra, Aerospike, etc.) Relational databases, and more 68
  • 69. Okay, So Where Does the DBA Fits In? • Big Data solutions are not databases. Databases are probably not going to disappear, but we feel the change even today: DBA’s must be ready for the change • DBA’s are the perfect candidates to transition into Big Data Experts: • Have system (OS, disk, memory, hardware) experience • Can understand data easily • DBA’s are used to work with developers and other data users http://brillix.co.il69
  • 70. What DBAs Needs Now? •DBA’s will need to know more programming: Java, Scala, Python, R or any other popular language in the Big Data world will do •DBA’s needs to understand the position shifts, and the introduction of DevOps, Data Scientists, CDO etc. •Big Data is changing daily: we need to learn, read, and be involved before we are left behind… http://brillix.co.il70
  • 71. Summary • Big Data is here – it’s complicated and RDBMS does not fit anymore • Big Data solutions are evolving Hadoop is an example for such a solution • Spark is very popular Big Data solution • DBA’s need to be ready for the change: Big Data solutions are not databases and we make ourselves ready http://brillix.co.il71
  • 73. Thank You and don’t forget to evaluate! Zohar Elkayam twitter: @realmgic [email protected] www.realdbamagic.com http://brillix.co.il73