SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
5th Annual
2© Cloudera, Inc. All rights reserved.
The Essentials of Apache Hadoop
The What, Why and How to Meet Agency Objectives
Sarah Sproehnle, Vice President, Customer Success
3© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
Introduction
4© Cloudera, Inc. All rights reserved.
What is Apache Hadoop?
• Hadoop is a software framework for storing, processing, and analyzing “big
data”
• Distributed
• Scalable
• Fault-tolerant
• Open source
5© Cloudera, Inc. All rights reserved.
A Large (and Growing) Ecosystem
Impala
6© Cloudera, Inc. All rights reserved.
About Cloudera
• The leader in Apache Hadoop-based software and services
• Founded in 2008 by leading experts on Hadoop
• Over 1000 employees
• Global operations spanning over 20 countries
• Provides support, consulting, training, and certification for Hadoop users
• Employs committers to virtually every significant Hadoop-related project
• Many authors of industry standard books on Apache Hadoop projects
• Tom White, Lars George, Kathleen Ting, etc.
7© Cloudera, Inc. All rights reserved.
• CDH (Cloudera’s Distribution,
including Apache Hadoop)
• 100% open source, enterprise-ready
distribution of Hadoop and related
projects
• The most complete, tested, and
widely-deployed distribution of
Hadoop
• Integrates all the key Hadoop
ecosystem projects
CDH
8© Cloudera, Inc. All rights reserved.
Vendor Integration
Platform
& Cloud
System
Integration
Data
Systems
Software
and OEM
9© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
How it Works
10© Cloudera, Inc. All rights reserved.
Traditional Large-Scale Computation
Traditionally, computation has been
processor-bound
• Relatively small amounts of data
• Lots of complex processing
The early solution: bigger computers
• Faster processor, more memory
• But even this couldn’t keep up
11© Cloudera, Inc. All rights reserved.
Distributed Systems
The better solution: more computers
• Distributed systems – use multiple machines for a single job
“In pioneer days they used oxen for heavy
pulling, and when one ox couldn’t budge a log,
we didn’t try to grow a larger ox. We shouldn’t
be trying for bigger computers, but for more
systems of computers.”
– Grace Hopper
12© Cloudera, Inc. All rights reserved.
Distributed Systems: The Data Bottleneck (1)
• Traditionally, data is stored in a central location
• Data is copied to processors at runtime
• Fine for limited amounts of data
13© Cloudera, Inc. All rights reserved.
Distributed Systems: The Data Bottleneck (2)
• Modern systems have much more data
• terabytes+ a day
• petabytes+ total
• We need a new approach…
14© Cloudera, Inc. All rights reserved.
The Origins of Hadoop
• Hadoop is based on work done at Google in the late 1990s/early 2000s
• Google’s problem:
• Indexing the entire web requires massive amounts of storage
• A new approach was required to process such large amounts of data
• Google’s solution:
• GFS, the Google File System - described in a paper released in 2003
• Distributed MapReduce - described in a paper released in 2004
• Doug Cutting and others read these papers and implemented a similar, open
source solution
• This is what would become Hadoop
15© Cloudera, Inc. All rights reserved.
What is Hadoop?
• Hadoop is a distributed data storage and processing platform
• Stores massive amounts of data in a very resilient way
• Distributes the processing to where the data is stored
• Tools built around Hadoop (the ‘Hadoop ecosystem’) can be
configured/extended to handle many different tasks
• Extract Transform Load (ETL)
• BI environment
• Predictive analytics
• Statistical analysis
• Machine learning
16© Cloudera, Inc. All rights reserved.
Hadoop is Scalable
• Adding nodes (machines) adds capacity proportionally
• Increasing load results in a graceful decline in performance
• Not failure of the system
Number of Nodes
Capacity
17© Cloudera, Inc. All rights reserved.
Hadoop is Fault Tolerant
• Node failure is inevitable
• What happens?
• System continues to function
• Master re-assigns work to a different node
• Data replication means there is no loss of data
• Nodes which recover rejoin the cluster automatically
18© Cloudera, Inc. All rights reserved.
The Hadoop Ecosystem (1)
• Many tools have been developed around ‘Core Hadoop’
• Known as the Hadoop ecosystem
• Designed to make Hadoop easier to use, or to extend its functionality
• All are open source
• The ecosystem is growing all the time
19© Cloudera, Inc. All rights reserved.
The Hadoop Ecosystem (2)
• Examples of Hadoop ecosystem projects (all included in CDH):
Project What does it do?
Spark In-memory and streaming processing framework
HBase NoSQL database built on HDFS
Hive SQL processing engine designed for batch workloads
Impala SQL query engine designed for BI workloads
Parquet Very efficient columnar data storage format
Sqoop Data movement to/from RDBMSs
Flume, Kafka Streaming data ingestion
Solr Powerful text search functionality
Hue Web-based user interface for Hadoop
Sentry Authorization tool, providing security for Hadoop
20© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
Hadoop in the Real World
21© Cloudera, Inc. All rights reserved.
Five Really Popular Use Cases
• ETL Processing
• Business Intelligence
• Predictive Analytics
• Enterprise Data Hub
• Low-cost Storage of Large Data Volumes
22© Cloudera, Inc. All rights reserved.
• ETL: Extract, Transform, Load
• Challenges:
• Too much data
• Takes too long
• Too costly
# 1 – Traditional ETL Processing
Networked Storage
ETL Grid Data Warehouse
Transactional
Systems
Data Stream BI Tools
Networked Storage
Tape Storage Tape Storage
23© Cloudera, Inc. All rights reserved.
# 1 – ETL Processing with Hadoop
• Hadoop cluster is used for ETL
• Often now ELT: Extract, Load, then Transform
• Structured and unstructured data is moved into the cluster
• Once processed, data can be analyzed in Hadoop or moved to the EDW
Networked Storage
Data Stream
Enterprise Data
Warehouse
ETL
Analytics
Real-time
24© Cloudera, Inc. All rights reserved.
# 1 – Vendor Integration - ETL
• For more information visit:
http://www.cloudera.com/partners/partners-
listing.html
25© Cloudera, Inc. All rights reserved.
# 2 – Traditional Business Intelligence
• BI traditionally takes place at the data
warehouse layer
• Problem: EDW can’t keep up with
growing data volumes
• Performance declines
• Increasing capacity can be very
expensive
• Archived data is not available for
analysis
Networked Storage
ETL Grid Data Warehouse
Transactional
Systems
Data Stream BI Tools
Tape Storage Tape Storage
26© Cloudera, Inc. All rights reserved.
# 2 – Business Intelligence with Hadoop
• BI tools can use Hadoop for
much of their work
• Analyze all the data
• Use the EDW for the tasks for which it is best suited
Networked Storage
Data Stream
Enterprise Data
Warehouse
ETL
Analytics
Real-time
BI Tools
27© Cloudera, Inc. All rights reserved.
# 2 – Vendor Integration - BI
• For more information visit:
http://www.cloudera.com/partners/partners-
listing.html
28© Cloudera, Inc. All rights reserved.
# 3 – Predictive Analytics
• Predictive Analytics (Eckerson Group definition)
• The use of statistical or machine learning models to discover patterns and
relationships in data that can help business people predict future behavior or
activity
• The Hadoop platform can run analytic workloads on large volumes of diverse
data
• Statistical models can be created and run inside the Hadoop environment
• Entire data sets can be used to create models
• There is no need to sample data
• Hadoop provides an environment that makes self-service analytics possible
• No need for ETL developers to stage data for data scientists
29© Cloudera, Inc. All rights reserved.
# 3 - Predictive Analytics: Cerner Corporation
Cerner Corporation
• Healthcare IT space
• Solutions and Services - Used by 54,000 medical facilities around the world
The problem
• Healthcare data is fragmented and lives in silos
• The data was used for historical reporting
The solution
• Build a comprehensive view of population health using a single platform
• Use predictive analytics to
• Improve patient outcomes
• Increase efficiency / Reduce costs
31© Cloudera, Inc. All rights reserved.
# 3 – Vendor Integration – Predictive Analytics
• For more information visit:
http://www.cloudera.com/partners/partners-
listing.html
32© Cloudera, Inc. All rights reserved.
# 4 – The Need for the Enterprise Data Hub
Thousands
of Employees &
Lots of Inaccessible
Information
Heterogeneous
Legacy IT
Infrastructure
Silos of Multi-
Structured Data
Difficult to Integrate
ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources
Data
Archives
EDWs Marts SearchServers Document Stores Storage
33© Cloudera, Inc. All rights reserved.
# 4 – The Enterprise Data Hub: One Unified
System
Information & data
accessible by all for
insight using leading
tools and apps
Enterprise Data Hub
Unified Data
Management
Infrastructure
Ingest All Data
Any Type
Any Scale
From Any Source
EDWs Marts Storage SearchServers Documents
ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources
EDH
Archives
34© Cloudera, Inc. All rights reserved.
# 5 - Low-cost Data Storage (1)
• Hadoop combines industry standard hardware and a fault tolerant architecture
• This combination provides a very cost effective data storage platform
• The data stored on Hadoop is protected from loss by HDFS
• Data replication ensures that no data is lost
• The self-healing nature of Hadoop ensures that data is available when you need it
• Hadoop enables users to store data which was previously discarded due to the
cost of saving it
• Transactional
• Social media
• Sensor
• Click stream
35© Cloudera, Inc. All rights reserved.
# 5 - Low-cost Data Storage (2)
• The low cost of HDFS storage enables the following use-cases:
• Enterprise Data Hub (EDH)
• Active data archive
• Staging area for data warehouses
• Staging area for analytics store
• Sandbox for data discovery
• Sandbox for analytics
36© Cloudera, Inc. All rights reserved.
In Summary, Why Do You Need Hadoop?
• More data is coming
• Internet of things
• Sensor data
• Streaming
• More data means bigger questions, better answers
• Hadoop easily scales to store and handle all of your data
• Hadoop is cost-effective
• Provides a significant cost-per-terabyte saving over traditional, legacy systems
• Hadoop integrates with your existing datacenter components
• Answer questions that you previously could not ask
37© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
Thank You
Ad

More Related Content

What's hot (20)

Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

Cloudera, Inc.
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
Cloudera, Inc.
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera, Inc.
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
Cloudera, Inc.
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
Cloudera, Inc.
 
A Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsA Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber Threats
Cloudera, Inc.
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
Cloudera, Inc.
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Cloudera, Inc.
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
Cloudera, Inc.
 
Relying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services ExperienceRelying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services Experience
Cloudera, Inc.
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
Cloudera, Inc.
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
Cloudera, Inc.
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
Cloudera, Inc.
 
The Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in ChurnThe Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in Churn
Cloudera, Inc.
 
Big data journey to the cloud maz chaudhri 5.30.18
Big data journey to the cloud   maz chaudhri 5.30.18Big data journey to the cloud   maz chaudhri 5.30.18
Big data journey to the cloud maz chaudhri 5.30.18
Cloudera, Inc.
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
Cloudera, Inc.
 
Secure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game ChangersSecure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game Changers
Cloudera, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

Cloudera, Inc.
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
Cloudera, Inc.
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera, Inc.
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
Cloudera, Inc.
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
Cloudera, Inc.
 
A Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsA Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber Threats
Cloudera, Inc.
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
Cloudera, Inc.
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Cloudera, Inc.
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
Cloudera, Inc.
 
Relying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services ExperienceRelying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services Experience
Cloudera, Inc.
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
Cloudera, Inc.
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
Cloudera, Inc.
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
Cloudera, Inc.
 
The Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in ChurnThe Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in Churn
Cloudera, Inc.
 
Big data journey to the cloud maz chaudhri 5.30.18
Big data journey to the cloud   maz chaudhri 5.30.18Big data journey to the cloud   maz chaudhri 5.30.18
Big data journey to the cloud maz chaudhri 5.30.18
Cloudera, Inc.
 
Secure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game ChangersSecure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game Changers
Cloudera, Inc.
 

Viewers also liked (20)

Using Tableau with Hortonworks Data Platform
Using Tableau with Hortonworks Data PlatformUsing Tableau with Hortonworks Data Platform
Using Tableau with Hortonworks Data Platform
Hortonworks
 
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and EcosystemErnestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
Volha Banadyseva
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The Essentials
Fadi Yousuf
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
Hortonworks
 
Introduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformIntroduction to Hortonworks Data Platform
Introduction to Hortonworks Data Platform
Hortonworks
 
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Hortonworks
 
Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09
Hortonworks
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat Sheet
Hortonworks
 
How Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHow Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform Education
Hortonworks
 
Hive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
Hortonworks
 
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonScaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Hortonworks
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Hortonworks
 
Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS
Hortonworks
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
Hortonworks
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
Hortonworks
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar
Cloudera, Inc.
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
 
Big data and hadoop ecosystem essentials for managers
Big data and hadoop ecosystem essentials for managersBig data and hadoop ecosystem essentials for managers
Big data and hadoop ecosystem essentials for managers
Manjeet Singh Nagi
 
Using Tableau with Hortonworks Data Platform
Using Tableau with Hortonworks Data PlatformUsing Tableau with Hortonworks Data Platform
Using Tableau with Hortonworks Data Platform
Hortonworks
 
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and EcosystemErnestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
Volha Banadyseva
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The Essentials
Fadi Yousuf
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
Hortonworks
 
Introduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformIntroduction to Hortonworks Data Platform
Introduction to Hortonworks Data Platform
Hortonworks
 
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Hortonworks
 
Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09
Hortonworks
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat Sheet
Hortonworks
 
How Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHow Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform Education
Hortonworks
 
Hive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
Hortonworks
 
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonScaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Hortonworks
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Hortonworks
 
Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS
Hortonworks
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
Hortonworks
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
Hortonworks
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar
Cloudera, Inc.
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
 
Big data and hadoop ecosystem essentials for managers
Big data and hadoop ecosystem essentials for managersBig data and hadoop ecosystem essentials for managers
Big data and hadoop ecosystem essentials for managers
Manjeet Singh Nagi
 
Ad

Similar to Hadoop Essentials -- The What, Why and How to Meet Agency Objectives (20)

Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
hadooparchbook
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
DataWorks Summit
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
markgrover
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
Gwen (Chen) Shapira
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Cloudera, Inc.
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera, Inc.
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
Jason Hubbard
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Uri Laserson
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
Cloudera, Inc.
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
markgrover
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
Learntek1
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
Riccardo Romani
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Steven Totman
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
hadooparchbook
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
markgrover
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Cloudera, Inc.
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera, Inc.
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
Jason Hubbard
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Uri Laserson
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
Cloudera, Inc.
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
markgrover
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
Learntek1
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
Riccardo Romani
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Steven Totman
 
Ad

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
Cloudera, Inc.
 
Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 

Recently uploaded (20)

Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Salesforce Aged Complex Org Revitalization Process .pdf
Salesforce Aged Complex Org Revitalization Process .pdfSalesforce Aged Complex Org Revitalization Process .pdf
Salesforce Aged Complex Org Revitalization Process .pdf
SRINIVASARAO PUSULURI
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Mastering OOP: Understanding the Four Core Pillars
Mastering OOP: Understanding the Four Core PillarsMastering OOP: Understanding the Four Core Pillars
Mastering OOP: Understanding the Four Core Pillars
Marcel David
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Sales Deck SentinelOne Singularity Platform.pptx
Sales Deck SentinelOne Singularity Platform.pptxSales Deck SentinelOne Singularity Platform.pptx
Sales Deck SentinelOne Singularity Platform.pptx
EliandoLawnote
 
Adobe Illustrator Crack | Free Download & Install Illustrator
Adobe Illustrator Crack | Free Download & Install IllustratorAdobe Illustrator Crack | Free Download & Install Illustrator
Adobe Illustrator Crack | Free Download & Install Illustrator
usmanhidray
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Adobe Photoshop Lightroom CC 2025 Crack Latest Version
Adobe Photoshop Lightroom CC 2025 Crack Latest VersionAdobe Photoshop Lightroom CC 2025 Crack Latest Version
Adobe Photoshop Lightroom CC 2025 Crack Latest Version
usmanhidray
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Salesforce Aged Complex Org Revitalization Process .pdf
Salesforce Aged Complex Org Revitalization Process .pdfSalesforce Aged Complex Org Revitalization Process .pdf
Salesforce Aged Complex Org Revitalization Process .pdf
SRINIVASARAO PUSULURI
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Mastering OOP: Understanding the Four Core Pillars
Mastering OOP: Understanding the Four Core PillarsMastering OOP: Understanding the Four Core Pillars
Mastering OOP: Understanding the Four Core Pillars
Marcel David
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Sales Deck SentinelOne Singularity Platform.pptx
Sales Deck SentinelOne Singularity Platform.pptxSales Deck SentinelOne Singularity Platform.pptx
Sales Deck SentinelOne Singularity Platform.pptx
EliandoLawnote
 
Adobe Illustrator Crack | Free Download & Install Illustrator
Adobe Illustrator Crack | Free Download & Install IllustratorAdobe Illustrator Crack | Free Download & Install Illustrator
Adobe Illustrator Crack | Free Download & Install Illustrator
usmanhidray
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Adobe Photoshop Lightroom CC 2025 Crack Latest Version
Adobe Photoshop Lightroom CC 2025 Crack Latest VersionAdobe Photoshop Lightroom CC 2025 Crack Latest Version
Adobe Photoshop Lightroom CC 2025 Crack Latest Version
usmanhidray
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 

Hadoop Essentials -- The What, Why and How to Meet Agency Objectives

  • 1. 1© Cloudera, Inc. All rights reserved. 5th Annual
  • 2. 2© Cloudera, Inc. All rights reserved. The Essentials of Apache Hadoop The What, Why and How to Meet Agency Objectives Sarah Sproehnle, Vice President, Customer Success
  • 3. 3© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. Introduction
  • 4. 4© Cloudera, Inc. All rights reserved. What is Apache Hadoop? • Hadoop is a software framework for storing, processing, and analyzing “big data” • Distributed • Scalable • Fault-tolerant • Open source
  • 5. 5© Cloudera, Inc. All rights reserved. A Large (and Growing) Ecosystem Impala
  • 6. 6© Cloudera, Inc. All rights reserved. About Cloudera • The leader in Apache Hadoop-based software and services • Founded in 2008 by leading experts on Hadoop • Over 1000 employees • Global operations spanning over 20 countries • Provides support, consulting, training, and certification for Hadoop users • Employs committers to virtually every significant Hadoop-related project • Many authors of industry standard books on Apache Hadoop projects • Tom White, Lars George, Kathleen Ting, etc.
  • 7. 7© Cloudera, Inc. All rights reserved. • CDH (Cloudera’s Distribution, including Apache Hadoop) • 100% open source, enterprise-ready distribution of Hadoop and related projects • The most complete, tested, and widely-deployed distribution of Hadoop • Integrates all the key Hadoop ecosystem projects CDH
  • 8. 8© Cloudera, Inc. All rights reserved. Vendor Integration Platform & Cloud System Integration Data Systems Software and OEM
  • 9. 9© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. How it Works
  • 10. 10© Cloudera, Inc. All rights reserved. Traditional Large-Scale Computation Traditionally, computation has been processor-bound • Relatively small amounts of data • Lots of complex processing The early solution: bigger computers • Faster processor, more memory • But even this couldn’t keep up
  • 11. 11© Cloudera, Inc. All rights reserved. Distributed Systems The better solution: more computers • Distributed systems – use multiple machines for a single job “In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, we didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but for more systems of computers.” – Grace Hopper
  • 12. 12© Cloudera, Inc. All rights reserved. Distributed Systems: The Data Bottleneck (1) • Traditionally, data is stored in a central location • Data is copied to processors at runtime • Fine for limited amounts of data
  • 13. 13© Cloudera, Inc. All rights reserved. Distributed Systems: The Data Bottleneck (2) • Modern systems have much more data • terabytes+ a day • petabytes+ total • We need a new approach…
  • 14. 14© Cloudera, Inc. All rights reserved. The Origins of Hadoop • Hadoop is based on work done at Google in the late 1990s/early 2000s • Google’s problem: • Indexing the entire web requires massive amounts of storage • A new approach was required to process such large amounts of data • Google’s solution: • GFS, the Google File System - described in a paper released in 2003 • Distributed MapReduce - described in a paper released in 2004 • Doug Cutting and others read these papers and implemented a similar, open source solution • This is what would become Hadoop
  • 15. 15© Cloudera, Inc. All rights reserved. What is Hadoop? • Hadoop is a distributed data storage and processing platform • Stores massive amounts of data in a very resilient way • Distributes the processing to where the data is stored • Tools built around Hadoop (the ‘Hadoop ecosystem’) can be configured/extended to handle many different tasks • Extract Transform Load (ETL) • BI environment • Predictive analytics • Statistical analysis • Machine learning
  • 16. 16© Cloudera, Inc. All rights reserved. Hadoop is Scalable • Adding nodes (machines) adds capacity proportionally • Increasing load results in a graceful decline in performance • Not failure of the system Number of Nodes Capacity
  • 17. 17© Cloudera, Inc. All rights reserved. Hadoop is Fault Tolerant • Node failure is inevitable • What happens? • System continues to function • Master re-assigns work to a different node • Data replication means there is no loss of data • Nodes which recover rejoin the cluster automatically
  • 18. 18© Cloudera, Inc. All rights reserved. The Hadoop Ecosystem (1) • Many tools have been developed around ‘Core Hadoop’ • Known as the Hadoop ecosystem • Designed to make Hadoop easier to use, or to extend its functionality • All are open source • The ecosystem is growing all the time
  • 19. 19© Cloudera, Inc. All rights reserved. The Hadoop Ecosystem (2) • Examples of Hadoop ecosystem projects (all included in CDH): Project What does it do? Spark In-memory and streaming processing framework HBase NoSQL database built on HDFS Hive SQL processing engine designed for batch workloads Impala SQL query engine designed for BI workloads Parquet Very efficient columnar data storage format Sqoop Data movement to/from RDBMSs Flume, Kafka Streaming data ingestion Solr Powerful text search functionality Hue Web-based user interface for Hadoop Sentry Authorization tool, providing security for Hadoop
  • 20. 20© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. Hadoop in the Real World
  • 21. 21© Cloudera, Inc. All rights reserved. Five Really Popular Use Cases • ETL Processing • Business Intelligence • Predictive Analytics • Enterprise Data Hub • Low-cost Storage of Large Data Volumes
  • 22. 22© Cloudera, Inc. All rights reserved. • ETL: Extract, Transform, Load • Challenges: • Too much data • Takes too long • Too costly # 1 – Traditional ETL Processing Networked Storage ETL Grid Data Warehouse Transactional Systems Data Stream BI Tools Networked Storage Tape Storage Tape Storage
  • 23. 23© Cloudera, Inc. All rights reserved. # 1 – ETL Processing with Hadoop • Hadoop cluster is used for ETL • Often now ELT: Extract, Load, then Transform • Structured and unstructured data is moved into the cluster • Once processed, data can be analyzed in Hadoop or moved to the EDW Networked Storage Data Stream Enterprise Data Warehouse ETL Analytics Real-time
  • 24. 24© Cloudera, Inc. All rights reserved. # 1 – Vendor Integration - ETL • For more information visit: http://www.cloudera.com/partners/partners- listing.html
  • 25. 25© Cloudera, Inc. All rights reserved. # 2 – Traditional Business Intelligence • BI traditionally takes place at the data warehouse layer • Problem: EDW can’t keep up with growing data volumes • Performance declines • Increasing capacity can be very expensive • Archived data is not available for analysis Networked Storage ETL Grid Data Warehouse Transactional Systems Data Stream BI Tools Tape Storage Tape Storage
  • 26. 26© Cloudera, Inc. All rights reserved. # 2 – Business Intelligence with Hadoop • BI tools can use Hadoop for much of their work • Analyze all the data • Use the EDW for the tasks for which it is best suited Networked Storage Data Stream Enterprise Data Warehouse ETL Analytics Real-time BI Tools
  • 27. 27© Cloudera, Inc. All rights reserved. # 2 – Vendor Integration - BI • For more information visit: http://www.cloudera.com/partners/partners- listing.html
  • 28. 28© Cloudera, Inc. All rights reserved. # 3 – Predictive Analytics • Predictive Analytics (Eckerson Group definition) • The use of statistical or machine learning models to discover patterns and relationships in data that can help business people predict future behavior or activity • The Hadoop platform can run analytic workloads on large volumes of diverse data • Statistical models can be created and run inside the Hadoop environment • Entire data sets can be used to create models • There is no need to sample data • Hadoop provides an environment that makes self-service analytics possible • No need for ETL developers to stage data for data scientists
  • 29. 29© Cloudera, Inc. All rights reserved. # 3 - Predictive Analytics: Cerner Corporation Cerner Corporation • Healthcare IT space • Solutions and Services - Used by 54,000 medical facilities around the world The problem • Healthcare data is fragmented and lives in silos • The data was used for historical reporting The solution • Build a comprehensive view of population health using a single platform • Use predictive analytics to • Improve patient outcomes • Increase efficiency / Reduce costs
  • 30. 31© Cloudera, Inc. All rights reserved. # 3 – Vendor Integration – Predictive Analytics • For more information visit: http://www.cloudera.com/partners/partners- listing.html
  • 31. 32© Cloudera, Inc. All rights reserved. # 4 – The Need for the Enterprise Data Hub Thousands of Employees & Lots of Inaccessible Information Heterogeneous Legacy IT Infrastructure Silos of Multi- Structured Data Difficult to Integrate ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources Data Archives EDWs Marts SearchServers Document Stores Storage
  • 32. 33© Cloudera, Inc. All rights reserved. # 4 – The Enterprise Data Hub: One Unified System Information & data accessible by all for insight using leading tools and apps Enterprise Data Hub Unified Data Management Infrastructure Ingest All Data Any Type Any Scale From Any Source EDWs Marts Storage SearchServers Documents ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources EDH Archives
  • 33. 34© Cloudera, Inc. All rights reserved. # 5 - Low-cost Data Storage (1) • Hadoop combines industry standard hardware and a fault tolerant architecture • This combination provides a very cost effective data storage platform • The data stored on Hadoop is protected from loss by HDFS • Data replication ensures that no data is lost • The self-healing nature of Hadoop ensures that data is available when you need it • Hadoop enables users to store data which was previously discarded due to the cost of saving it • Transactional • Social media • Sensor • Click stream
  • 34. 35© Cloudera, Inc. All rights reserved. # 5 - Low-cost Data Storage (2) • The low cost of HDFS storage enables the following use-cases: • Enterprise Data Hub (EDH) • Active data archive • Staging area for data warehouses • Staging area for analytics store • Sandbox for data discovery • Sandbox for analytics
  • 35. 36© Cloudera, Inc. All rights reserved. In Summary, Why Do You Need Hadoop? • More data is coming • Internet of things • Sensor data • Streaming • More data means bigger questions, better answers • Hadoop easily scales to store and handle all of your data • Hadoop is cost-effective • Provides a significant cost-per-terabyte saving over traditional, legacy systems • Hadoop integrates with your existing datacenter components • Answer questions that you previously could not ask
  • 36. 37© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. Thank You

Editor's Notes

  • #9: All this said, Hadoop alone is not sufficient to help you succeed. You have existing investments in technology, vendors you already rely upon. Cloudera partners more broadly and deeply across the Hadoop ecosystem than any other vendor. With more than 1,900 partners across the entire stack, customers can choice wherever and however they want to deploy their big data platform. With integrations to existing technologies, we are able to provide our customers with compatibility with your existing tools and skills. Our special relationship with Intel, through their strategic investment in Cloudera, delivers unique advantages to our customers: Innovation: Hadoop is evolving at a pace like never seen before. New projects are constantly being added, sometimes at the expense of performance, security & quality. Aligning hardware and software roadmaps accelerates the pace of innovation, without compromising quality.  The most secure and performance new innovations are delivered faster. Ecosystem: Data architectures are complicated, requiring multiple products from multiple vendors. Often times, stitching together solutions can lead to one of projects that limit value to the business. Building modern architectures requires economies of scale, the ability to align independent providers along common goals for enterprise readiness. Intel and Cloudera have deep partnerships with leading vendors so you can build solutions with confidence. Reliability: New technology segments are hard to predict. Bet on the wrong one you slow your business down trying to integrate with existing investments or worse, run the risk of having to start from scratch, losing both time and money. Building modern architectures requires economies of scale, the ability to align independent providers along common goals for enterprise readiness.  Intel and Cloudera have deep partnerships with leading vendors so you can build solutions with confidence.