SlideShare a Scribd company logo
Discover HDP 2.2 
Data Storage Innovations in Hadoop Distributed File System (HDFS) 
Page 1 © Hortonworks Inc. 2014 
Hortonworks. We do Hadoop.
Speakers 
Page 2 © Hortonworks Inc. 2014 
Rohit Bakhshi 
Hortonworks Senior Product Manager & PM for Apache 
Hadoop & Apache Solr in Hortonworks Data Platform 
Jitendra Pandey 
Hortonworks Senior Architect for HDFS
Agenda 
• Overview of HDFS 
• New HDFS Innovation in HDP 2.2 
– Heterogeneous storage 
– Encryption 
– Operational security enhancements 
• Q & A 
We’ll move quickly: 
• Attendee phone lines are muted 
• Text any questions to Jitendra using Webex chat 
• Questions will be answered at the end of the call 
• Unanswered questions and answers in upcoming FAQ/blog post 
Page 3 © Hortonworks Inc. 2014
Big Data, Hadoop & Data Center Re-platforming 
Business Drivers 
• From reactive analytics 
to proactive interactions 
• Insights that drive 
competitive advantage 
& optimal returns 
Page 4 © Hortonworks Inc. 2014 
$ 
Financial Drivers 
• Cost of data systems, as 
% of IT spend, 
continues to grow 
• Cost advantages of 
commodity hardware 
& open source software 
Technical Drivers 
• Data is growing 
exponentially & existing 
systems overwhelmed 
• Predominantly driven by 
NEW types of data that 
can inform analytics 
There is an inequitable balance between vendor and customer in the market
Clickstream 
Capture and analyze 
website visitors’ data 
trails and optimize 
your website 
Page 5 © Hortonworks Inc. 2014 
Sensors 
Discover patterns in 
data streaming 
automatically from 
remote sensors and 
machines 
Server Logs 
Research logs to 
diagnose process 
failures and prevent 
security breaches 
Hadoop Value: New Types of Data 
Sentiment 
Understand how 
your customers feel 
about your brand 
and products – 
right now 
Geographic 
Analyze location-based 
data to 
manage operations 
where they occur 
Unstructured 
Understand patterns 
in files across millions 
of web pages, emails, 
and documents
A Shift from Reactive to Proactive Interactions 
A shift in Advertising 
From mass branding …to 1x1 Targeting 
A shift in Financial Services 
From Educated Investing …to Automated Algorithms 
A shift in Healthcare 
From mass treatment …to Designer Medicine 
A shift in Retail 
A shift in Telco 
Page 6 © Hortonworks Inc. 2014 
HDP and Hadoop allow 
organizations to use 
data to shift interactions 
from… 
Reactive 
Post Transaction 
Proactive 
Pre Decision 
…to Real-t From static branding ime Personalization 
From break then fix …to repair before break
Enterprise Goals for the Modern Data Architecture 
Batch Interactive Real-Time 
Page 7 © Hortonworks Inc. 2014 
• Consolidate siloed data sets structured 
and unstructured 
• Central data set on a single cluster 
• Multiple workloads across batch 
interactive and real time 
• Central services for security, governance 
and operation 
• Preserve existing investment in current 
tools and platforms 
• Single view of the customer, product, 
supply chain 
DATA SYSTEM APPLICATIONS 
Business 
Analytics 
Custom 
Applications 
Packaged 
Applications 
RDBMS 
EDW 
MPP 
YARN: Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
° 
° ° ° ° ° ° ° ° N 
CRM 
ERP 
Other 
1 ° ° ° 
° ° ° HDFS 
(Hadoop Distributed File System) 
SOURCES 
EXISTING 
Systems 
Clickstream 
Web 
&Social 
Geoloca9on 
Sensor 
& 
Machine 
Server 
Logs 
Unstructured
YARN Transformed Hadoop & Opened a New Era 
Script 
Pig 
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS 
SQL 
Hive 
TezTez 
Page 8 © Hortonworks Inc. 2014 
YARN 
The Architectural 
Center of Hadoop 
• Common data platform, many applications 
• Support multi-tenant access & processing 
• Batch, interactive & real-time use cases 
Java 
Scala 
Cascading 
Tez 
Stream 
Storm 
YARN: Data Operating System 
(Cluster Resource Management) 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
° ° 
° ° 
Others 
ISV 
Engines 
° ° ° ° ° 
° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
Sli der 
Slider 
In-Memory 
Spark
YARN Extends Hadoop to Other Data Center Leaders 
Script 
Pig 
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS 
SQL 
Hive 
TezTez 
Java 
Scala 
Cascading 
Tez 
NoSQL 
HBase 
Accumulo 
Sli der 
1 ° ° ° ° ° ° ° 
Stream 
Storm 
Slider 
HDFS 
In-Memory 
Spark 
(Hadoop Distributed File System) 
° ° ° ° ° ° ° ° 
Page 9 © Hortonworks Inc. 2014 
YARN 
The Architectural 
Center of Hadoop 
• Common data platform, many applications 
• Support multi-tenant access & processing 
• Batch, interactive & real-time use cases 
• Supports 3rd-party ISV tools 
(ex. SAS, Syncsort, Actian, etc.) 
YARN: Data Operating System 
(Cluster Resource Management) 
° ° 
° ° 
Others 
ISV 
Engines 
Search 
Solr 
° ° ° ° ° 
° ° ° ° ° 
YARN Ready Applications 
Facilitates ongoing innovation and enterprise adoption via 
ecosystem of new and existing “YARN Ready” solutions
Enterprise Hadoop: Central Set of Services 
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS 
GOVERNANCE SECURITY OPERATIONS 
Tez 
TezTez 
Page 10 © Hortonworks Inc. 2014 
Slider 
Slider 
YARN: Data Operating System 
(Cluster Resource Management) 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
° ° 
° ° 
° ° ° ° ° 
° ° ° ° ° 
Enables Apache Hadoop to be 
an Enterprise Data Platform 
with centralized services for: 
• Governance 
• Operations 
• Security 
Everything that plugs into 
Hadoop inherits these services 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Load data and 
manage 
according 
to policy 
Deploy and 
effectively 
manage the 
platform 
Provide layered 
approach to 
security through 
Authentication, 
Authorization, 
Accounting, and 
Data Protection 
Script 
Pig 
SQL 
Hive 
Java 
Scala 
Cascading 
Stream 
Storm 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
In-Memory 
Spark 
Others 
ISV 
Engines 
HDFS 
(Hadoop Distributed File System)
Hortonworks Development Investment for the Enterprise 
Vertical Integration with YARN and HDFS 
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS 
GOVERNANCE SECURITY OPERATIONS 
Tez 
TezTez 
Slider 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
Page 11 © Hortonworks Inc. 2014 
Slider 
° ° 
° ° 
° ° ° ° ° 
° ° ° ° ° 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Load data and 
manage 
according 
to policy 
Deploy and 
effectively 
manage the 
platform 
Provide layered 
approach to 
security through 
Authentication, 
Authorization, 
Accounting, and 
Data Protection 
Script 
Pig 
SQL 
Hive 
Java 
Scala 
Cascading 
Stream 
Storm 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
In-Memory 
Spark 
Others 
ISV 
Engines 
YARN: Data Operating System 
(Cluster Resource Management) 
HDFS 
(Hadoop Distributed File System) 
• Ensure engines can run reliably and respectfully in a YARN based cluster 
• Implement features throughout the stack to accommodate
Hortonworks Development Investment for the Enterprise 
Horizontal Integration for Enterprise Services 
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS 
GOVERNANCE SECURITY OPERATIONS 
Tez 
TezTez 
Slider 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
Page 12 © Hortonworks Inc. 2014 
Slider 
° ° 
° ° 
° ° ° ° ° 
° ° ° ° ° 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Load data and 
manage 
according 
to policy 
Deploy and 
effectively 
manage the 
platform 
Provide layered 
approach to 
security through 
Authentication, 
Authorization, 
Accounting, and 
Data Protection 
Script 
Pig 
SQL 
Hive 
Java 
Scala 
Cascading 
Stream 
Storm 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
In-Memory 
Spark 
Others 
ISV 
Engines 
YARN: Data Operating System 
(Cluster Resource Management) 
HDFS 
(Hadoop Distributed File System) 
• Ensure consistent enterprise services are applied across the entire Hadoop stack 
• Integrate with and extend existing data center solutions for these key requirements
HDP Delivers Enterprise Hadoop 
Hortonworks Data Platform 2.2 
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY OPERATIONS 
Script 
Pig 
SQL 
Hive 
TezTez 
Page 13 © Hortonworks Inc. 2014 
Java 
Scala 
Cascading 
Tez 
Stream 
Storm 
YARN: Data Operating System 
(Cluster Resource Management) 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
° ° 
° ° 
° ° ° ° ° 
° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
Sli der 
Slider 
In-Memory 
Spark 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Data Workflow, 
Lifecycle & 
Governance 
Falcon 
Sqoop 
Flume 
Kafka 
NFS 
WebHDFS 
Authentication 
Authorization 
Audit 
Data Protection 
Storage: HDFS 
Resources: YARN 
Access: Hive 
Pipeline: Falcon 
Cluster: Ranger 
Cluster: Knox 
Linux Windows Deployment Choice Cloud 
YARN is the architectural 
center of HDP 
• Common data set across all 
applications 
• Batch, interactive & real-time 
workloads 
• Multi-tenant access & processing 
Provides comprehensive 
enterprise capabilities 
• Governance 
• Security 
• Operations 
Enables broad 
ecosystem adoption 
• ISVs can plug directly into Hadoop 
The widest range of deployment options 
• Linux & Windows 
• On premises & cloud 
Others 
ISV 
Engines 
On-Premises
HDP Delivers Enterprise Hadoop 
Hortonworks Data Platform 2.2 
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY OPERATIONS 
1 ° ° ° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
° ° ° ° ° ° ° ° 
Page 14 © Hortonworks Inc. 2014 
YARN: Data Operating System 
(Cluster Resource Management) 
Script 
Pig 
SQL 
Hive 
TezTez 
Java 
Scala 
Cascading 
Tez 
Stream 
Storm 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
Sli der 
Slider 
In-Memory 
Spark 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Data Workflow, 
Lifecycle & 
Governance 
Falcon 
Sqoop 
Flume 
Kafka 
NFS 
WebHDFS 
Authentication 
Authorization 
Audit 
Data Protection 
Storage: HDFS 
Resources: YARN 
Access: Hive 
Pipeline: Falcon 
Cluster: Ranger 
Cluster: Knox 
YARN is the architectural 
center of HDP 
• Common data set across all 
applications 
• Batch, interactive & real-time 
workloads 
• Multi-tenant access & processing 
Provides comprehensive 
enterprise capabilities 
• Governance 
• Security 
• Operations 
Enables broad 
ecosystem adoption 
• ISVs can plug directly into Hadoop 
° ° 
° ° 
° ° ° ° ° 
° ° ° ° ° 
The widest range of deployment options 
• Linux & Windows 
• On premises & cloud 
Others 
ISV 
Engines 
Linux Windows Deployment Choice On-Premises Cloud
Overview of HDFS 
Page 15 © Hortonworks Inc. 2014
HDFS enables the Common Data Platform 
Script 
Pig 
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS 
SQL 
Hive 
TezTez 
Page 16 © Hortonworks Inc. 2014 
HDFS 
Storage Platform for Modern Data 
Architecture 
• Common data platform across multiple 
application workloads 
• Reliable 
• Scalable 
• Cost Efficient 
Java 
Scala 
Cascading 
Tez 
Stream 
Storm 
YARN: Data Operating System 
(Cluster Resource Management) 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
° ° 
° ° 
Others 
ISV 
Engines 
° ° ° ° ° 
° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
Sli der 
Slider 
In-Memory 
Spark
HDFS Innovations on HDP 2.2 
Page 17 © Hortonworks Inc. 2014
HDFS in HDP 2.2: What’s New 
Page 18 © Hortonworks Inc. 2014 
Heterogeneous 
Storage 
• Archive 
and 
SSD 
Tiers 
• Tech 
Preview: 
Enable 
intermediate 
data 
to 
stored 
in 
memory 
Heterogeneous 
Storage 
THEME 
Encryp9on 
• Tech 
Preview: 
Transparent 
Data 
Encryp?on 
Security 
THEME 
DataNode 
does 
not 
require 
Root 
to 
start 
• HDFS 
services 
in 
a 
Kerberized 
cluster 
no 
longer 
need 
Root 
to 
start 
Security 
THEME
New in HDP 2.2: 
Heterogeneous Storage 
Page 19 © Hortonworks Inc. 2014
Heterogeneous Storage 
Before 
• DataNode is a single storage 
• Storage is uniform - Only storage type Disk 
• Storage types hidden from the file system 
New Architecture 
• DataNode is a collection of storages 
• Support different types of storages 
– Disk, SSDs, Memory 
Page 20 © Hortonworks Inc. 2014 
All disks as a single storage 
S3 
Swift 
SAN 
Filers 
Collection of tiered storages
HDFS Storage Architecture - Now 
Page 21 © Hortonworks Inc. 2014
Storage Policies: Archival 
DISK 
DISK 
DISK 
DISK 
Page 22 © Hortonworks Inc. 2014 
DISK 
DISK 
DISK 
DISK 
DISK 
ARCHIVE 
ARCHIVE 
ARCHIVE 
ARCHIVE 
ARCHIVE 
ARCHIVE 
ARCHIVE 
ARCHIVE 
ARCHIVE 
Warm 
1 replica on DISK, 
others on ARCHIVE 
Hot 
All replicas on DISK 
Cold 
All replicas on 
ARCHIVE 
HDP Cluster
Storage Policy: SSD 
SSD 
DISK 
DISK 
SSD 
Page 23 © Hortonworks Inc. 2014 
DISK 
DISK 
SSD 
DISK 
DISK 
SSD 
DISK 
DISK 
SSD 
DISK 
DISK 
HDP Cluster 
A 
SSD 
DISK 
DISK 
A A 
SSD 
DataSet A All replicas on SSD
Store Intermediate Data in Memory 
Page 24 © Hortonworks Inc. 2014 
Application 
Process 
Write block to memory 
Memory Tier 
Lazy persist 
block to disk 
RAM_DISK 
Tech Preview feature 
For data writes that: 
- Need low latency writes 
- Where data is regenerate-able
New in HDP 2.2: 
Encryption 
Page 25 © Hortonworks Inc. 2014
HDFS Transparent Data Encryption 
• HDFS Encryption – Transparent Encryption in HDFS – HDFS-6134 
– Designate a dir as encryption zone, all files in the zone are encrypted 
– Dependency on Key Management Server 
• Key Management Server - HADOOP-10433 
– The custodian for all encryption keys in Hadoop 
– REST API for key CRUD operations 
• Key Provider API - HADOOP-10141 
– API to allow Hadoop code (NN, DN, DFS Clients) CRUD operations on key material 
Page 26 © Hortonworks Inc. 2014
HDFS Transparent Data Encryption 
1 
° 
° 
° 
° 
1 
° 
° 
° 
° 
° 
Encrypted 
File 
(aIributes 
-­‐ 
EDEK, 
IV) 
° 
° 
° 
° 
° 
° 
Encryp9on 
Zone 
° 
° 
° 
° 
° 
° 
(aIributes 
-­‐ 
EZKey 
ID, 
version) 
HDFS-­‐6134 
Page 27 © Hortonworks Inc. 2014 
° 
° 
KeyProvider 
° 
° 
° 
° 
Name 
Node 
° 
° 
° 
° 
N 
DATA 
ACCESS 
DATA 
MANAGEMENT 
SECURITY 
YARN 
HDFS 
Client 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
HDFS 
° 
(Hadoop 
Distributed 
File 
System) 
API 
KeyProvider 
API 
KeyProvider 
API 
– 
Hadoop-­‐10141 
Key 
Management 
System 
(KMS) 
Hadoop-­‐10433 
EDEK 
DEK 
Crypto 
Stream 
(r/w 
with 
DEK) 
DEKs 
EZKs 
Acronym 
Descrip?on 
EZ 
Encryp?on 
Zone 
(an 
HDFS 
directory) 
EZK 
Encryp?on 
Zone 
Key; 
master 
key 
associated 
with 
all 
files 
in 
an 
EZ 
DEK 
Data 
Encryp?on 
Key, 
unique 
key 
associated 
with 
each 
file. 
EZ 
Key 
used 
to 
generate 
DEK 
EDEK 
Encrypted 
DEK, 
Name 
Node 
only 
has 
access 
to 
encrypted 
DEK. 
IV 
Ini?aliza?on 
Vector 
EDEK 
EDEK
New in HDP 2.2: 
Operational Security Enhancements 
Page 28 © Hortonworks Inc. 2014
DataNode does not require root 
Enables Organizations to run services without utilizing root privilege 
For Kerberized clusters 
DataNode no longer needs to run as the Linux root user when starting 
DataNode no longer needs to bind to privileged ports 
DataNode utilizes SASL to transfer blocks between HDFS clients and 
DataNodes. 
Page 29 © Hortonworks Inc. 2014
Q & A 
Page 30 © Hortonworks Inc. 2014
Thank you! 
Learn more at: 
hortonworks.com/hadoop/hdfs/ 
Page 31 © Hortonworks Inc. 2014 
Register for the remaining 4 
Discover HDP 2.2 Webinars 
Hortonworks.com/webinars
Ad

More Related Content

What's hot (20)

Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
Hortonworks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
Hortonworks
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Hortonworks
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Hortonworks
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
Hortonworks
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
Hortonworks
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
Hortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
Hortonworks
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
Hortonworks
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun Connolly
Hortonworks
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Hortonworks
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
Hortonworks
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your Budget
Hortonworks
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
Hortonworks
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
Hortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
Hortonworks
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
Hortonworks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
Hortonworks
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Hortonworks
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Hortonworks
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
Hortonworks
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
Hortonworks
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
Hortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
Hortonworks
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
Hortonworks
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun Connolly
Hortonworks
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Hortonworks
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
Hortonworks
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your Budget
Hortonworks
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
Hortonworks
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
Hortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
Hortonworks
 

Viewers also liked (17)

Node labels in YARN
Node labels in YARNNode labels in YARN
Node labels in YARN
Wangda Tan
 
Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and Memory
Chris Nauroth
 
Sgi infinite storage gateway
Sgi infinite storage gatewaySgi infinite storage gateway
Sgi infinite storage gateway
inside-BigData.com
 
Ganzheitliche Speicherlösungen: Unser Storage-Konzept in der Praxis
Ganzheitliche Speicherlösungen: Unser Storage-Konzept in der PraxisGanzheitliche Speicherlösungen: Unser Storage-Konzept in der Praxis
Ganzheitliche Speicherlösungen: Unser Storage-Konzept in der Praxis
Fujitsu Central Europe
 
Network data storage
Network data storageNetwork data storage
Network data storage
Hadi Fadlallah
 
Storage as a service v4 eng
Storage as a service v4 engStorage as a service v4 eng
Storage as a service v4 eng
Dell EMC
 
Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFS
DataWorks Summit
 
Oracle Cloud Storage Service & Oracle Database Backup Cloud Service
Oracle Cloud Storage Service & Oracle Database Backup Cloud ServiceOracle Cloud Storage Service & Oracle Database Backup Cloud Service
Oracle Cloud Storage Service & Oracle Database Backup Cloud Service
Jean-Philippe PINTE
 
Future of Data Storage in the Cloud
Future of Data Storage in the CloudFuture of Data Storage in the Cloud
Future of Data Storage in the Cloud
Bret Piatt
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
Hortonworks
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
Hortonworks
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks
 
4×4: Big Data in der Cloud
4×4: Big Data in der Cloud4×4: Big Data in der Cloud
4×4: Big Data in der Cloud
Danny Linden
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
Hortonworks
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Hortonworks
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
Hortonworks
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
Hortonworks
 
Node labels in YARN
Node labels in YARNNode labels in YARN
Node labels in YARN
Wangda Tan
 
Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and Memory
Chris Nauroth
 
Ganzheitliche Speicherlösungen: Unser Storage-Konzept in der Praxis
Ganzheitliche Speicherlösungen: Unser Storage-Konzept in der PraxisGanzheitliche Speicherlösungen: Unser Storage-Konzept in der Praxis
Ganzheitliche Speicherlösungen: Unser Storage-Konzept in der Praxis
Fujitsu Central Europe
 
Storage as a service v4 eng
Storage as a service v4 engStorage as a service v4 eng
Storage as a service v4 eng
Dell EMC
 
Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFS
DataWorks Summit
 
Oracle Cloud Storage Service & Oracle Database Backup Cloud Service
Oracle Cloud Storage Service & Oracle Database Backup Cloud ServiceOracle Cloud Storage Service & Oracle Database Backup Cloud Service
Oracle Cloud Storage Service & Oracle Database Backup Cloud Service
Jean-Philippe PINTE
 
Future of Data Storage in the Cloud
Future of Data Storage in the CloudFuture of Data Storage in the Cloud
Future of Data Storage in the Cloud
Bret Piatt
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
Hortonworks
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
Hortonworks
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks
 
4×4: Big Data in der Cloud
4×4: Big Data in der Cloud4×4: Big Data in der Cloud
4×4: Big Data in der Cloud
Danny Linden
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
Hortonworks
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Hortonworks
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
Hortonworks
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
Hortonworks
 
Ad

Similar to Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (HDFS) (20)

Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Hortonworks
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
Rommel Garcia
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
Rommel Garcia
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014
Hortonworks
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
Joan Novino
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Data Con LA
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
POSSCON
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
huguk
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
POSSCON
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
Slim Baltagi
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
Emil Andreas Siemes
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Hortonworks
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
Rommel Garcia
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
Rommel Garcia
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014
Hortonworks
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
Joan Novino
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Data Con LA
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
POSSCON
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
huguk
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
POSSCON
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
Slim Baltagi
 
Ad

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 

Recently uploaded (20)

Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 

Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (HDFS)

  • 1. Discover HDP 2.2 Data Storage Innovations in Hadoop Distributed File System (HDFS) Page 1 © Hortonworks Inc. 2014 Hortonworks. We do Hadoop.
  • 2. Speakers Page 2 © Hortonworks Inc. 2014 Rohit Bakhshi Hortonworks Senior Product Manager & PM for Apache Hadoop & Apache Solr in Hortonworks Data Platform Jitendra Pandey Hortonworks Senior Architect for HDFS
  • 3. Agenda • Overview of HDFS • New HDFS Innovation in HDP 2.2 – Heterogeneous storage – Encryption – Operational security enhancements • Q & A We’ll move quickly: • Attendee phone lines are muted • Text any questions to Jitendra using Webex chat • Questions will be answered at the end of the call • Unanswered questions and answers in upcoming FAQ/blog post Page 3 © Hortonworks Inc. 2014
  • 4. Big Data, Hadoop & Data Center Re-platforming Business Drivers • From reactive analytics to proactive interactions • Insights that drive competitive advantage & optimal returns Page 4 © Hortonworks Inc. 2014 $ Financial Drivers • Cost of data systems, as % of IT spend, continues to grow • Cost advantages of commodity hardware & open source software Technical Drivers • Data is growing exponentially & existing systems overwhelmed • Predominantly driven by NEW types of data that can inform analytics There is an inequitable balance between vendor and customer in the market
  • 5. Clickstream Capture and analyze website visitors’ data trails and optimize your website Page 5 © Hortonworks Inc. 2014 Sensors Discover patterns in data streaming automatically from remote sensors and machines Server Logs Research logs to diagnose process failures and prevent security breaches Hadoop Value: New Types of Data Sentiment Understand how your customers feel about your brand and products – right now Geographic Analyze location-based data to manage operations where they occur Unstructured Understand patterns in files across millions of web pages, emails, and documents
  • 6. A Shift from Reactive to Proactive Interactions A shift in Advertising From mass branding …to 1x1 Targeting A shift in Financial Services From Educated Investing …to Automated Algorithms A shift in Healthcare From mass treatment …to Designer Medicine A shift in Retail A shift in Telco Page 6 © Hortonworks Inc. 2014 HDP and Hadoop allow organizations to use data to shift interactions from… Reactive Post Transaction Proactive Pre Decision …to Real-t From static branding ime Personalization From break then fix …to repair before break
  • 7. Enterprise Goals for the Modern Data Architecture Batch Interactive Real-Time Page 7 © Hortonworks Inc. 2014 • Consolidate siloed data sets structured and unstructured • Central data set on a single cluster • Multiple workloads across batch interactive and real time • Central services for security, governance and operation • Preserve existing investment in current tools and platforms • Single view of the customer, product, supply chain DATA SYSTEM APPLICATIONS Business Analytics Custom Applications Packaged Applications RDBMS EDW MPP YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N CRM ERP Other 1 ° ° ° ° ° ° HDFS (Hadoop Distributed File System) SOURCES EXISTING Systems Clickstream Web &Social Geoloca9on Sensor & Machine Server Logs Unstructured
  • 8. YARN Transformed Hadoop & Opened a New Era Script Pig BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SQL Hive TezTez Page 8 © Hortonworks Inc. 2014 YARN The Architectural Center of Hadoop • Common data platform, many applications • Support multi-tenant access & processing • Batch, interactive & real-time use cases Java Scala Cascading Tez Stream Storm YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Others ISV Engines ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Search Solr NoSQL HBase Accumulo Sli der Slider In-Memory Spark
  • 9. YARN Extends Hadoop to Other Data Center Leaders Script Pig BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SQL Hive TezTez Java Scala Cascading Tez NoSQL HBase Accumulo Sli der 1 ° ° ° ° ° ° ° Stream Storm Slider HDFS In-Memory Spark (Hadoop Distributed File System) ° ° ° ° ° ° ° ° Page 9 © Hortonworks Inc. 2014 YARN The Architectural Center of Hadoop • Common data platform, many applications • Support multi-tenant access & processing • Batch, interactive & real-time use cases • Supports 3rd-party ISV tools (ex. SAS, Syncsort, Actian, etc.) YARN: Data Operating System (Cluster Resource Management) ° ° ° ° Others ISV Engines Search Solr ° ° ° ° ° ° ° ° ° ° YARN Ready Applications Facilitates ongoing innovation and enterprise adoption via ecosystem of new and existing “YARN Ready” solutions
  • 10. Enterprise Hadoop: Central Set of Services BATCH, INTERACTIVE & REAL-TIME DATA ACCESS GOVERNANCE SECURITY OPERATIONS Tez TezTez Page 10 © Hortonworks Inc. 2014 Slider Slider YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Enables Apache Hadoop to be an Enterprise Data Platform with centralized services for: • Governance • Operations • Security Everything that plugs into Hadoop inherits these services Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Load data and manage according to policy Deploy and effectively manage the platform Provide layered approach to security through Authentication, Authorization, Accounting, and Data Protection Script Pig SQL Hive Java Scala Cascading Stream Storm Search Solr NoSQL HBase Accumulo In-Memory Spark Others ISV Engines HDFS (Hadoop Distributed File System)
  • 11. Hortonworks Development Investment for the Enterprise Vertical Integration with YARN and HDFS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS GOVERNANCE SECURITY OPERATIONS Tez TezTez Slider 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Page 11 © Hortonworks Inc. 2014 Slider ° ° ° ° ° ° ° ° ° ° ° ° ° ° Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Load data and manage according to policy Deploy and effectively manage the platform Provide layered approach to security through Authentication, Authorization, Accounting, and Data Protection Script Pig SQL Hive Java Scala Cascading Stream Storm Search Solr NoSQL HBase Accumulo In-Memory Spark Others ISV Engines YARN: Data Operating System (Cluster Resource Management) HDFS (Hadoop Distributed File System) • Ensure engines can run reliably and respectfully in a YARN based cluster • Implement features throughout the stack to accommodate
  • 12. Hortonworks Development Investment for the Enterprise Horizontal Integration for Enterprise Services BATCH, INTERACTIVE & REAL-TIME DATA ACCESS GOVERNANCE SECURITY OPERATIONS Tez TezTez Slider 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Page 12 © Hortonworks Inc. 2014 Slider ° ° ° ° ° ° ° ° ° ° ° ° ° ° Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Load data and manage according to policy Deploy and effectively manage the platform Provide layered approach to security through Authentication, Authorization, Accounting, and Data Protection Script Pig SQL Hive Java Scala Cascading Stream Storm Search Solr NoSQL HBase Accumulo In-Memory Spark Others ISV Engines YARN: Data Operating System (Cluster Resource Management) HDFS (Hadoop Distributed File System) • Ensure consistent enterprise services are applied across the entire Hadoop stack • Integrate with and extend existing data center solutions for these key requirements
  • 13. HDP Delivers Enterprise Hadoop Hortonworks Data Platform 2.2 GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY OPERATIONS Script Pig SQL Hive TezTez Page 13 © Hortonworks Inc. 2014 Java Scala Cascading Tez Stream Storm YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Search Solr NoSQL HBase Accumulo Sli der Slider In-Memory Spark Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS Authentication Authorization Audit Data Protection Storage: HDFS Resources: YARN Access: Hive Pipeline: Falcon Cluster: Ranger Cluster: Knox Linux Windows Deployment Choice Cloud YARN is the architectural center of HDP • Common data set across all applications • Batch, interactive & real-time workloads • Multi-tenant access & processing Provides comprehensive enterprise capabilities • Governance • Security • Operations Enables broad ecosystem adoption • ISVs can plug directly into Hadoop The widest range of deployment options • Linux & Windows • On premises & cloud Others ISV Engines On-Premises
  • 14. HDP Delivers Enterprise Hadoop Hortonworks Data Platform 2.2 GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY OPERATIONS 1 ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) ° ° ° ° ° ° ° ° Page 14 © Hortonworks Inc. 2014 YARN: Data Operating System (Cluster Resource Management) Script Pig SQL Hive TezTez Java Scala Cascading Tez Stream Storm Search Solr NoSQL HBase Accumulo Sli der Slider In-Memory Spark Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS Authentication Authorization Audit Data Protection Storage: HDFS Resources: YARN Access: Hive Pipeline: Falcon Cluster: Ranger Cluster: Knox YARN is the architectural center of HDP • Common data set across all applications • Batch, interactive & real-time workloads • Multi-tenant access & processing Provides comprehensive enterprise capabilities • Governance • Security • Operations Enables broad ecosystem adoption • ISVs can plug directly into Hadoop ° ° ° ° ° ° ° ° ° ° ° ° ° ° The widest range of deployment options • Linux & Windows • On premises & cloud Others ISV Engines Linux Windows Deployment Choice On-Premises Cloud
  • 15. Overview of HDFS Page 15 © Hortonworks Inc. 2014
  • 16. HDFS enables the Common Data Platform Script Pig BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SQL Hive TezTez Page 16 © Hortonworks Inc. 2014 HDFS Storage Platform for Modern Data Architecture • Common data platform across multiple application workloads • Reliable • Scalable • Cost Efficient Java Scala Cascading Tez Stream Storm YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Others ISV Engines ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Search Solr NoSQL HBase Accumulo Sli der Slider In-Memory Spark
  • 17. HDFS Innovations on HDP 2.2 Page 17 © Hortonworks Inc. 2014
  • 18. HDFS in HDP 2.2: What’s New Page 18 © Hortonworks Inc. 2014 Heterogeneous Storage • Archive and SSD Tiers • Tech Preview: Enable intermediate data to stored in memory Heterogeneous Storage THEME Encryp9on • Tech Preview: Transparent Data Encryp?on Security THEME DataNode does not require Root to start • HDFS services in a Kerberized cluster no longer need Root to start Security THEME
  • 19. New in HDP 2.2: Heterogeneous Storage Page 19 © Hortonworks Inc. 2014
  • 20. Heterogeneous Storage Before • DataNode is a single storage • Storage is uniform - Only storage type Disk • Storage types hidden from the file system New Architecture • DataNode is a collection of storages • Support different types of storages – Disk, SSDs, Memory Page 20 © Hortonworks Inc. 2014 All disks as a single storage S3 Swift SAN Filers Collection of tiered storages
  • 21. HDFS Storage Architecture - Now Page 21 © Hortonworks Inc. 2014
  • 22. Storage Policies: Archival DISK DISK DISK DISK Page 22 © Hortonworks Inc. 2014 DISK DISK DISK DISK DISK ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE Warm 1 replica on DISK, others on ARCHIVE Hot All replicas on DISK Cold All replicas on ARCHIVE HDP Cluster
  • 23. Storage Policy: SSD SSD DISK DISK SSD Page 23 © Hortonworks Inc. 2014 DISK DISK SSD DISK DISK SSD DISK DISK SSD DISK DISK HDP Cluster A SSD DISK DISK A A SSD DataSet A All replicas on SSD
  • 24. Store Intermediate Data in Memory Page 24 © Hortonworks Inc. 2014 Application Process Write block to memory Memory Tier Lazy persist block to disk RAM_DISK Tech Preview feature For data writes that: - Need low latency writes - Where data is regenerate-able
  • 25. New in HDP 2.2: Encryption Page 25 © Hortonworks Inc. 2014
  • 26. HDFS Transparent Data Encryption • HDFS Encryption – Transparent Encryption in HDFS – HDFS-6134 – Designate a dir as encryption zone, all files in the zone are encrypted – Dependency on Key Management Server • Key Management Server - HADOOP-10433 – The custodian for all encryption keys in Hadoop – REST API for key CRUD operations • Key Provider API - HADOOP-10141 – API to allow Hadoop code (NN, DN, DFS Clients) CRUD operations on key material Page 26 © Hortonworks Inc. 2014
  • 27. HDFS Transparent Data Encryption 1 ° ° ° ° 1 ° ° ° ° ° Encrypted File (aIributes -­‐ EDEK, IV) ° ° ° ° ° ° Encryp9on Zone ° ° ° ° ° ° (aIributes -­‐ EZKey ID, version) HDFS-­‐6134 Page 27 © Hortonworks Inc. 2014 ° ° KeyProvider ° ° ° ° Name Node ° ° ° ° N DATA ACCESS DATA MANAGEMENT SECURITY YARN HDFS Client ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS ° (Hadoop Distributed File System) API KeyProvider API KeyProvider API – Hadoop-­‐10141 Key Management System (KMS) Hadoop-­‐10433 EDEK DEK Crypto Stream (r/w with DEK) DEKs EZKs Acronym Descrip?on EZ Encryp?on Zone (an HDFS directory) EZK Encryp?on Zone Key; master key associated with all files in an EZ DEK Data Encryp?on Key, unique key associated with each file. EZ Key used to generate DEK EDEK Encrypted DEK, Name Node only has access to encrypted DEK. IV Ini?aliza?on Vector EDEK EDEK
  • 28. New in HDP 2.2: Operational Security Enhancements Page 28 © Hortonworks Inc. 2014
  • 29. DataNode does not require root Enables Organizations to run services without utilizing root privilege For Kerberized clusters DataNode no longer needs to run as the Linux root user when starting DataNode no longer needs to bind to privileged ports DataNode utilizes SASL to transfer blocks between HDFS clients and DataNodes. Page 29 © Hortonworks Inc. 2014
  • 30. Q & A Page 30 © Hortonworks Inc. 2014
  • 31. Thank you! Learn more at: hortonworks.com/hadoop/hdfs/ Page 31 © Hortonworks Inc. 2014 Register for the remaining 4 Discover HDP 2.2 Webinars Hortonworks.com/webinars