SlideShare a Scribd company logo
HADOOP	
  AT	
  THE	
  CENTER:	
  
THE	
  NEXT	
  GENERATION	
  OF	
  HADOOP	
  
	
  
Adam	
  Muise	
  
Principal	
  Architect	
  
Hortonworks	
  
Who	
  am	
  I?	
  
Who	
  is	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ?	
  
We	
  do	
  Hadoop	
  
The	
  leaders	
  of	
  Hadoop’s	
  
development	
  
Community	
  driven,	
  	
  
Enterprise	
  Focused	
  
Drive	
  InnovaDon	
  in	
  
the	
  plaEorm	
  –	
  We	
  
lead	
  the	
  roadmap	
  	
  
100%	
  Open	
  Source	
  –	
  
DemocraDzed	
  Access	
  to	
  
Data	
  
Let’s	
  talk	
  challenges…	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Storage,	
  Management,	
  Processing	
  
all	
  become	
  challenges	
  with	
  Data	
  at	
  
Volume	
  
TradiDonal	
  technologies	
  adopt	
  a	
  
divide,	
  drop,	
  and	
  conquer	
  approach	
  
The	
  soluDon?	
  
EDW	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Yet	
  Another	
  EDW	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
AnalyDcal	
  DB	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   OLTP	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Another	
  EDW	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Ummm…you	
  
dropped	
  something	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
EDW	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Yet	
  Another	
  EDW	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
AnalyDcal	
  DB	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
OLTP	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Another	
  EDW	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
What	
  keeps	
  us	
  from	
  our	
  Data?	
  
Data	
  Silos.	
  
Your	
  data	
  silos	
  are	
  lonely	
  places.	
  
EDW	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Accounts	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Customers	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Web	
  ProperDes	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
…	
  Data	
  likes	
  to	
  be	
  together.	
  
EDW	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Accounts	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Customers	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Web	
  ProperDes	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
  likes	
  to	
  socialize	
  too.	
  
EDW	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Accounts	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Customers	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Web	
  ProperDes	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Machine	
  Data	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
TwiWer	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Facebook	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
CDR	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Weather	
  Data	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
New	
  types	
  of	
  data	
  don’t	
  quite	
  fit	
  into	
  
your	
  prisDne	
  view	
  of	
  the	
  world.	
  
My	
  LiWle	
  Data	
  Empire	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Logs	
  
Data	
  
Data	
  Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Machine	
  Data	
  
Data	
  
Data	
  Data	
  Data	
  
Data	
  
Data	
  
Data	
  
?	
  
?	
  
?	
  ?	
  
To	
  resolve	
  this,	
  some	
  people	
  take	
  
hints	
  from	
  Lord	
  Of	
  The	
  Rings...	
  
…and	
  create	
  One-­‐Schema-­‐To-­‐Rule-­‐
Them-­‐All…	
  
EDW	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  Schema	
  
…but	
  that	
  has	
  its	
  problems	
  too.	
  
EDW	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  Schema	
  Data	
  
Data	
  
Data	
  
ETL	
   ETL	
  
ETL	
   ETL	
  
EDW	
  
Data	
  Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  Schema	
  Data	
  
Data	
  
Data	
  
ETL	
   ETL	
  
ETL	
   ETL	
  
What	
  if	
  the	
  data	
  was	
  processed	
  and	
  stored	
  centrally?	
  
What	
  if	
  you	
  didn’t	
  need	
  to	
  force	
  it	
  into	
  a	
  single	
  schema?	
  
	
  
	
  
We	
  call	
  it	
  a	
  Modern	
  Data	
  Architecture*	
  
	
  
	
  
	
  
*AKA	
  Data	
  Lake	
  
A Modern Data Architecture
•  Consolidate siloed data sets structured
and unstructured
•  Central data set on a single cluster
•  Multiple workloads across batch
interactive and real time
•  Central services for security, governance
and operation
•  Preserve existing investment in current
tools and platforms
•  Single view of the customer, product,
supply chain
APPLICATIONS	
  DATA	
  	
  SYSTEM	
  
Business	
  	
  
Analy>cs	
  
Custom	
  
Applica>ons	
  
Packaged	
  
Applica>ons	
  
RDBMS	
  
EDW	
  
MPP	
  
YARN:	
  Data	
  Opera>ng	
  System	
  
1	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   N	
  
Interactive Real-TimeBatch
CRM	
  
ERP	
  
Other	
  
1	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
  
HDFS	
  	
  
(Hadoop	
  Distributed	
  File	
  System)	
  
SOURCES	
  
EXISTING	
  
Systems	
  
Clickstream	
   Web	
  	
  
&Social	
  
Geoloca>on	
   Sensor	
  	
  
&	
  Machine	
  
Server	
  	
  
Logs	
  
Unstructured	
  
Hadoop	
  Primer…	
  
{Storage	
  +	
  Processing}	
  
=	
  
{HDFS	
  +	
  YARN}	
  
{Storage}	
  
=	
  
{HDFS}	
  
HDFS	
  stores	
  data	
  in	
  blocks	
  and	
  
replicates	
  those	
  blocks	
  
Storage	
   Storage	
   Storage	
  
Storage	
   Storage	
   Storage	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Processing	
   Processing	
  Processing	
  
Processing	
   Processing	
  Processing	
  
block3	
   block3	
  
block3	
  
block2	
   block2	
  
block2	
  
block1	
  
block1	
  
block1	
  
If	
  a	
  block	
  fails	
  then	
  HDFS	
  always	
  has	
  
the	
  other	
  copies	
  and	
  heals	
  itself	
  
Storage	
   Storage	
   Storage	
  
Storage	
   Storage	
   Storage	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Processing	
   Processing	
  Processing	
  
Processing	
   Processing	
  Processing	
  
block3	
  
block3	
  
block3	
  
block2	
   block2	
  
block2	
  
block1	
  
block1	
  
block1	
  
X
{Processing}	
  
=	
  
{YARN}	
  
Actually	
  YARN	
  provides	
  a	
  
framework	
  for	
  processing…	
  
YARN	
  =	
  Yet	
  Another	
  Resource	
  
NegoDator	
  
Resource	
  Manager	
  
+	
  
Node	
  Managers	
  
=	
  YARN	
  
Resource	
  Manager	
  
AppMaster	
  
Node	
  Manager	
  
Scheduler	
  
AppMaster	
  
AppMaster	
  
Node	
  Manager	
  
Node	
  Manager	
  
Node	
  Manager	
  
Container	
  
Container	
  
MapReduce	
  
Container	
  
Storm	
  
Container	
  
Container	
  
Container	
  
Pig	
  
Container	
  
Container	
  
Container	
  
YARN	
  abstracts	
  resource	
  
management	
  so	
  you	
  can	
  run	
  more	
  
than	
  just	
  MapReduce	
  
HDFS2	
  
MapReduce	
  V2	
  
YARN	
  
Slider	
  
Storm	
  
MPI	
  Giraph	
  
Tez	
   …	
  and	
  
more	
  Spark	
  
YARN	
  is	
  like	
  the	
  OS,	
  and	
  the	
  processing	
  
frameworks	
  are	
  Apps.	
  Slider,	
  Tez,	
  and	
  Spark	
  
are	
  all	
  significant	
  distributed	
  processing	
  
frameworks	
  that	
  host	
  a	
  mulDtude	
  of	
  Data	
  
Tools.	
  
Yet	
  	
  	
  	
  	
  	
  Another	
  	
  	
  	
  	
  Resource	
  	
  	
  	
  	
  NegoDator	
  
Slider	
   Tez	
   Spark	
  
HBase	
   Storm	
   Cascading	
  Hive	
   Pig	
   SQL	
   Streaming	
  MLLib	
  
AnalyDcs	
  needs	
  Data.	
  	
  
Work	
  with	
  the	
  populaDon,	
  	
  
not	
  a	
  sample.	
  
Your	
  segmentaDon	
  today.	
  
Male	
  
Female	
  
Age:	
  25-­‐30	
  
Town/City	
  
Middle	
  Income	
  Band	
  
Product	
  Category	
  
Preferences	
  
Your	
  segmentaDon	
  with	
  beWer	
  
data.	
  
Male	
  
Female	
  
Age:	
  27	
  but	
  feels	
  old	
  
GPS	
  coordinates	
  
$65-­‐68k	
  per	
  year	
  
Product	
  recommendaDons	
  per	
  
Dme	
  of	
  day	
  and	
  per	
  weather	
  
Tea	
  Party	
  
Hippie	
  
Looking	
  to	
  start	
  a	
  
business	
  	
  
Walking	
  into	
  Starbucks	
  right	
  
now…	
  
A	
  depressed	
  Toronto	
  Maple	
  
Leaf’s	
  Fan	
  
Products	
  lep	
  in	
  basket	
  indicate	
  drunk	
  
amazon	
  shopper	
  
Purchase	
  history	
  
indicates	
  a	
  risk	
  taker	
  
Thinking	
  about	
  a	
  new	
  
house	
  
Unhappy	
  with	
  his	
  cell	
  phone	
  plan	
  
Pregnant	
  
Spent	
  25	
  minutes	
  looking	
  at	
  
tea	
  cozies	
  
Don’t	
  wait	
  for	
  your	
  data.	
  	
  
Batch	
  is	
  open	
  too	
  late	
  to	
  influence	
  the	
  person	
  who	
  
is	
  in	
  your	
  store	
  or	
  on	
  your	
  website	
  right	
  now.	
  
Enter	
  Streaming	
  
…aka	
  Micro-­‐Batching,	
  aka	
  Micro-­‐ETL,	
  aka	
  near-­‐realDme	
  
Streaming Processing, Search, and Storage
Hortonworks	
  Data	
  PlaOorm	
  2.2	
  
YARN	
  
HDFS	
  
APACHE	
  
KAFKA	
  
Search	
  
Solr	
  Slider	
  
	
  
Online	
  Data	
  	
  
Processing	
  
HBase	
  
	
  	
  
Real	
  Time	
  Stream	
  	
  
Processing	
  
Storm	
   SQL	
  
Hive	
  Streaming	
  
Ingest	
  
Stream	
  data	
  into	
  
Hadoop	
  and	
  process	
  it	
  in	
  
near	
  real-­‐Dme	
  
Real-­‐Dme	
  data	
  feeds	
  
Hortonworks Data Platform:
A comprehensive data management platform
Hortonworks	
  Data	
  PlaOorm	
  2.2	
  
	
  	
  
YARN: Data Operating System
(Cluster	
  Resource	
  Management)	
  
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez
Tez
Java
Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
Others
ISV
Engines
HDFS
(Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase
Accumulo
Slider
 Slider
	
  
	
  
SECURITY	
  GOVERNANCE	
   OPERATIONS	
  BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow,
Lifecycle &
Governance
Falcon
Sqoop
Flume
Kafka
NFS
WebHDFS
Authentication
Authorization
Accounting
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive, …
Pipeline: Falcon
Cluster: Knox
Cluster: Ranger
Deployment ChoiceLinux Windows On-Premises Cloud
HDP = Apache Hadoop
Hortonworks	
  Data	
  PlaOorm	
  2.2	
  
	
  	
  Hadoop	
  
	
  	
  &YARN	
  
	
  	
  Pig	
  
	
  	
  Hive	
  &	
  HCatalog	
  
	
  	
  HBase	
  
	
  	
  Sqoop	
  
	
  	
  Oozie	
  
	
  	
  Zookeeper	
  
	
  	
  Ambari	
  
	
  	
  Storm	
  
	
  	
  Flume	
  
	
  	
  Knox	
  
	
  	
  Phoenix	
  
	
  	
  Accumulo	
  
2.2.0	
  
0.12.0	
  
0.12.0	
  
2.4.0	
  
0.12.1	
  
Data
Management
0.13.0	
  
0.96.1	
  
0.98.0	
  
0.9.1	
  
1.4.4	
  
1.3.1	
  
1.4.0	
  
1.4.4	
  
1.5.1	
  
3.3.2	
  
4.0.0	
  
3.4.5	
  
0.4.0	
  
4.0.0	
  
1.5.1	
  
	
  	
  Falcon	
  
0.5.0	
  
	
  	
  Ranger	
  
	
  	
  Spark	
  
	
  	
  Kaaa	
  
0.14.0	
  
0.14.0	
  
0.98.4	
  
1.6.1	
  
4.2	
  
0.9.3	
  
1.2.0	
  
0.6.0	
  
0.8.1	
  
1.4.5	
  
1.5.0	
  
1.7.0	
  
4.1.0	
  
0.5.0	
  
0.4.0	
  
2.6.0	
  
3.4.5	
  
	
  	
  Tez	
  
0.4.0	
  
	
  	
  Slider	
  
0.60	
  
HDP	
  2.0	
  
October	
  
2013	
  
HDP	
  2.2	
  
October	
  
2014	
  
HDP	
  2.1	
  
April	
  
2014	
  
	
  	
  Solr	
  
4.7.2	
  
4.10.0	
  
0.5.1	
  
Data Access
Governance
& Integration
SecurityOperations
What	
  else	
  are	
  we	
  working	
  on?	
  
hortonworks.com/labs/	
  
© Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page	
  44	
  
There is NO second place
Hortonworks	
  
…the	
  Bull	
  Elephant	
  of	
  Hadoop	
  Innova>on	
  
Ad

More Related Content

What's hot (20)

2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points
Adam Muise
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Mahantesh Angadi
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
C. Scyphers
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
Eric Kavanagh
 
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
Mark Rittman
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
Trendwise Analytics
 
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Desing Pathshala
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
Roman Nikitchenko
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
C. Scyphers
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
Sonal Tiwari
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
Savvycom Savvycom
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
Ghassan Al-Yafie
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
Edureka!
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
Imply
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
Asis Mohanty
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
Frans van Noort
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar
 
2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points
Adam Muise
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Mahantesh Angadi
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
C. Scyphers
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
Eric Kavanagh
 
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
Mark Rittman
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
Trendwise Analytics
 
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Desing Pathshala
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
Roman Nikitchenko
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
C. Scyphers
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
Sonal Tiwari
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
Savvycom Savvycom
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
Ghassan Al-Yafie
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
Edureka!
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
Imply
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
Frans van Noort
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar
 

Similar to Next Generation Hadoop Introduction (20)

2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
Adam Muise
 
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMACWhat is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMAC
Adam Muise
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
rightsize
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
Empowered Holdings, LLC
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
Edureka!
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
Rajesh Kumar
 
Metadata Lakes for Next-Gen AI/ML - Lisa N. Cao
Metadata Lakes for Next-Gen AI/ML - Lisa N. CaoMetadata Lakes for Next-Gen AI/ML - Lisa N. Cao
Metadata Lakes for Next-Gen AI/ML - Lisa N. Cao
Zilliz
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
Dux Chandegra
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
Denodo
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
Inside Analysis
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
DataWorks Summit/Hadoop Summit
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And Hadoop
Edureka!
 
Solving Big Data Problems
Solving Big Data ProblemsSolving Big Data Problems
Solving Big Data Problems
Evaluator Group
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Denodo
 
Metadata Lakes for Next-Gen AI/ML - Datastrato
Metadata Lakes for Next-Gen AI/ML - DatastratoMetadata Lakes for Next-Gen AI/ML - Datastrato
Metadata Lakes for Next-Gen AI/ML - Datastrato
Zilliz
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
Athens Big Data
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Hortonworks
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - Informatica
Sanjeev Kumar
 
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
Adam Muise
 
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMACWhat is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMAC
Adam Muise
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
rightsize
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
Edureka!
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
Rajesh Kumar
 
Metadata Lakes for Next-Gen AI/ML - Lisa N. Cao
Metadata Lakes for Next-Gen AI/ML - Lisa N. CaoMetadata Lakes for Next-Gen AI/ML - Lisa N. Cao
Metadata Lakes for Next-Gen AI/ML - Lisa N. Cao
Zilliz
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
Denodo
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
Inside Analysis
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
DataWorks Summit/Hadoop Summit
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And Hadoop
Edureka!
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Denodo
 
Metadata Lakes for Next-Gen AI/ML - Datastrato
Metadata Lakes for Next-Gen AI/ML - DatastratoMetadata Lakes for Next-Gen AI/ML - Datastrato
Metadata Lakes for Next-Gen AI/ML - Datastrato
Zilliz
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
Athens Big Data
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Hortonworks
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - Informatica
Sanjeev Kumar
 
Ad

More from Adam Muise (13)

2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
Adam Muise
 
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
Adam Muise
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
Adam Muise
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
Adam Muise
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
Adam Muise
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
Adam Muise
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Adam Muise
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
Adam Muise
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
Adam Muise
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog
Adam Muise
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
Adam Muise
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
Adam Muise
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohug
Adam Muise
 
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
Adam Muise
 
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
Adam Muise
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
Adam Muise
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
Adam Muise
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
Adam Muise
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
Adam Muise
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Adam Muise
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
Adam Muise
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
Adam Muise
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog
Adam Muise
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
Adam Muise
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
Adam Muise
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohug
Adam Muise
 
Ad

Recently uploaded (20)

TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 

Next Generation Hadoop Introduction

  • 1. HADOOP  AT  THE  CENTER:   THE  NEXT  GENERATION  OF  HADOOP     Adam  Muise   Principal  Architect   Hortonworks  
  • 3. Who  is                                                      ?  
  • 4. We  do  Hadoop   The  leaders  of  Hadoop’s   development   Community  driven,     Enterprise  Focused   Drive  InnovaDon  in   the  plaEorm  –  We   lead  the  roadmap     100%  Open  Source  –   DemocraDzed  Access  to   Data  
  • 7. Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  
  • 8. Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume  
  • 9. Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  Volume   Volume   Volume  
  • 10. Storage,  Management,  Processing   all  become  challenges  with  Data  at   Volume  
  • 11. TradiDonal  technologies  adopt  a   divide,  drop,  and  conquer  approach  
  • 12. The  soluDon?   EDW   Data  Data   Data   Data   Data   Data   Data   Data   Data   Yet  Another  EDW   Data  Data   Data   Data   Data   Data   Data   Data   Data   AnalyDcal  DB   Data  Data   Data   Data   Data   Data   Data   Data   Data   OLTP   Data  Data   Data   Data   Data   Data   Data   Data   Data   Another  EDW   Data  Data   Data   Data   Data   Data   Data   Data   Data  
  • 13. Ummm…you   dropped  something   Data  Data   Data   Data   Data   Data   Data   Data   Data   Data  Data   Data   Data   Data   Data   Data   Data   Data  Data  Data   Data   Data   Data   Data   Data   Data   Data   Data  Data   Data   Data   Data   Data   Data   Data   Data   Data  Data   Data   Data   Data   Data   Data   Data   Data   Data  Data   Data   Data   Data   Data   Data   Data   Data   Data  Data   Data   Data   Data   Data   Data   Data   Data  Data   Data   Data   Data   Data   Data   Data   Data  Data  Data   Data   Data   Data   Data   Data   Data   Data   EDW   Data  Data   Data   Data   Data   Data   Data   Data   Data   Yet  Another  EDW   Data  Data   Data   Data   Data   Data   Data   Data   Data   AnalyDcal  DB   Data  Data   Data   Data   Data   Data   Data   Data   Data   OLTP   Data  Data   Data   Data   Data   Data   Data   Data   Data   Another  EDW   Data  Data   Data   Data   Data   Data   Data   Data   Data  
  • 14. What  keeps  us  from  our  Data?  
  • 15. Data  Silos.   Your  data  silos  are  lonely  places.   EDW   Data  Data   Data   Data   Data   Data   Data   Data   Data   Accounts   Data  Data   Data   Data   Data   Data   Data   Data   Data   Customers   Data  Data   Data   Data   Data   Data   Data   Data   Data   Web  ProperDes   Data  Data   Data   Data   Data   Data   Data   Data   Data  
  • 16. …  Data  likes  to  be  together.   EDW   Data  Data   Data   Data   Data   Data   Data   Data   Data   Accounts   Data  Data   Data   Data   Data   Data   Data   Data   Data   Customers   Data  Data   Data   Data   Data   Data   Data   Data   Data   Web  ProperDes   Data  Data   Data   Data   Data   Data   Data   Data   Data  
  • 17. Data  likes  to  socialize  too.   EDW   Data  Data   Data   Data   Data   Data   Data   Data   Data   Accounts   Data  Data   Data   Data   Data   Data   Data   Data   Data   Customers   Data  Data   Data   Data   Data   Data   Data   Data   Data   Web  ProperDes   Data  Data   Data   Data   Data   Data   Data   Data   Data   Machine  Data   Data  Data   Data   Data   Data   Data   Data   Data   Data   TwiWer   Data  Data   Data   Data   Data   Data   Data   Data   Data   Facebook   Data  Data   Data   Data   Data   Data   Data   Data   Data   CDR   Data  Data   Data   Data   Data   Data   Data   Data   Data   Weather  Data   Data  Data   Data   Data   Data   Data   Data   Data   Data  
  • 18. New  types  of  data  don’t  quite  fit  into   your  prisDne  view  of  the  world.   My  LiWle  Data  Empire   Data  Data   Data   Data   Data   Data   Data   Data   Data   Logs   Data   Data  Data  Data   Data   Data   Data   Machine  Data   Data   Data  Data  Data   Data   Data   Data   ?   ?   ?  ?  
  • 19. To  resolve  this,  some  people  take   hints  from  Lord  Of  The  Rings...  
  • 20. …and  create  One-­‐Schema-­‐To-­‐Rule-­‐ Them-­‐All…   EDW   Data  Data   Data   Data   Data   Data   Data   Data   Data  Schema  
  • 21. …but  that  has  its  problems  too.   EDW   Data  Data   Data   Data   Data   Data   Data   Data   Data  Schema  Data   Data   Data   ETL   ETL   ETL   ETL   EDW   Data  Data   Data   Data   Data   Data   Data   Data   Data  Schema  Data   Data   Data   ETL   ETL   ETL   ETL  
  • 22. What  if  the  data  was  processed  and  stored  centrally?   What  if  you  didn’t  need  to  force  it  into  a  single  schema?       We  call  it  a  Modern  Data  Architecture*         *AKA  Data  Lake  
  • 23. A Modern Data Architecture •  Consolidate siloed data sets structured and unstructured •  Central data set on a single cluster •  Multiple workloads across batch interactive and real time •  Central services for security, governance and operation •  Preserve existing investment in current tools and platforms •  Single view of the customer, product, supply chain APPLICATIONS  DATA    SYSTEM   Business     Analy>cs   Custom   Applica>ons   Packaged   Applica>ons   RDBMS   EDW   MPP   YARN:  Data  Opera>ng  System   1   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   N   Interactive Real-TimeBatch CRM   ERP   Other   1   °   °   °   °   °   °   °   HDFS     (Hadoop  Distributed  File  System)   SOURCES   EXISTING   Systems   Clickstream   Web     &Social   Geoloca>on   Sensor     &  Machine   Server     Logs   Unstructured  
  • 25. {Storage  +  Processing}   =   {HDFS  +  YARN}  
  • 27. HDFS  stores  data  in  blocks  and   replicates  those  blocks   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Processing   Processing  Processing   Processing   Processing  Processing   Processing   Processing  Processing   block3   block3   block3   block2   block2   block2   block1   block1   block1  
  • 28. If  a  block  fails  then  HDFS  always  has   the  other  copies  and  heals  itself   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Processing   Processing  Processing   Processing   Processing  Processing   Processing   Processing  Processing   block3   block3   block3   block2   block2   block2   block1   block1   block1   X
  • 30. Actually  YARN  provides  a   framework  for  processing…  
  • 31. YARN  =  Yet  Another  Resource   NegoDator  
  • 32. Resource  Manager   +   Node  Managers   =  YARN   Resource  Manager   AppMaster   Node  Manager   Scheduler   AppMaster   AppMaster   Node  Manager   Node  Manager   Node  Manager   Container   Container   MapReduce   Container   Storm   Container   Container   Container   Pig   Container   Container   Container  
  • 33. YARN  abstracts  resource   management  so  you  can  run  more   than  just  MapReduce   HDFS2   MapReduce  V2   YARN   Slider   Storm   MPI  Giraph   Tez   …  and   more  Spark  
  • 34. YARN  is  like  the  OS,  and  the  processing   frameworks  are  Apps.  Slider,  Tez,  and  Spark   are  all  significant  distributed  processing   frameworks  that  host  a  mulDtude  of  Data   Tools.   Yet            Another          Resource          NegoDator   Slider   Tez   Spark   HBase   Storm   Cascading  Hive   Pig   SQL   Streaming  MLLib  
  • 35. AnalyDcs  needs  Data.     Work  with  the  populaDon,     not  a  sample.  
  • 36. Your  segmentaDon  today.   Male   Female   Age:  25-­‐30   Town/City   Middle  Income  Band   Product  Category   Preferences  
  • 37. Your  segmentaDon  with  beWer   data.   Male   Female   Age:  27  but  feels  old   GPS  coordinates   $65-­‐68k  per  year   Product  recommendaDons  per   Dme  of  day  and  per  weather   Tea  Party   Hippie   Looking  to  start  a   business     Walking  into  Starbucks  right   now…   A  depressed  Toronto  Maple   Leaf’s  Fan   Products  lep  in  basket  indicate  drunk   amazon  shopper   Purchase  history   indicates  a  risk  taker   Thinking  about  a  new   house   Unhappy  with  his  cell  phone  plan   Pregnant   Spent  25  minutes  looking  at   tea  cozies  
  • 38. Don’t  wait  for  your  data.     Batch  is  open  too  late  to  influence  the  person  who   is  in  your  store  or  on  your  website  right  now.  
  • 39. Enter  Streaming   …aka  Micro-­‐Batching,  aka  Micro-­‐ETL,  aka  near-­‐realDme  
  • 40. Streaming Processing, Search, and Storage Hortonworks  Data  PlaOorm  2.2   YARN   HDFS   APACHE   KAFKA   Search   Solr  Slider     Online  Data     Processing   HBase       Real  Time  Stream     Processing   Storm   SQL   Hive  Streaming   Ingest   Stream  data  into   Hadoop  and  process  it  in   near  real-­‐Dme   Real-­‐Dme  data  feeds  
  • 41. Hortonworks Data Platform: A comprehensive data management platform Hortonworks  Data  PlaOorm  2.2       YARN: Data Operating System (Cluster  Resource  Management)   1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Script Pig SQL Hive Tez Tez Java Scala Cascading Tez ° ° ° ° ° ° ° ° ° ° ° ° ° ° Others ISV Engines HDFS (Hadoop Distributed File System) Stream Storm Search Solr NoSQL HBase Accumulo Slider Slider     SECURITY  GOVERNANCE   OPERATIONS  BATCH, INTERACTIVE & REAL-TIME DATA ACCESS In-Memory Spark Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS Authentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox Cluster: Ranger Deployment ChoiceLinux Windows On-Premises Cloud
  • 42. HDP = Apache Hadoop Hortonworks  Data  PlaOorm  2.2      Hadoop      &YARN      Pig      Hive  &  HCatalog      HBase      Sqoop      Oozie      Zookeeper      Ambari      Storm      Flume      Knox      Phoenix      Accumulo   2.2.0   0.12.0   0.12.0   2.4.0   0.12.1   Data Management 0.13.0   0.96.1   0.98.0   0.9.1   1.4.4   1.3.1   1.4.0   1.4.4   1.5.1   3.3.2   4.0.0   3.4.5   0.4.0   4.0.0   1.5.1      Falcon   0.5.0      Ranger      Spark      Kaaa   0.14.0   0.14.0   0.98.4   1.6.1   4.2   0.9.3   1.2.0   0.6.0   0.8.1   1.4.5   1.5.0   1.7.0   4.1.0   0.5.0   0.4.0   2.6.0   3.4.5      Tez   0.4.0      Slider   0.60   HDP  2.0   October   2013   HDP  2.2   October   2014   HDP  2.1   April   2014      Solr   4.7.2   4.10.0   0.5.1   Data Access Governance & Integration SecurityOperations
  • 43. What  else  are  we  working  on?   hortonworks.com/labs/  
  • 44. © Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page  44   There is NO second place Hortonworks   …the  Bull  Elephant  of  Hadoop  Innova>on