SlideShare a Scribd company logo
Harnessing Data-in-Motion
with Hortonworks DataFlow
Introduction to HDF 2.0
Haimo Liu
Product Manager
Aldrin Piri
Technical Staff
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda HDF 2.0: Flow Management
– NiFi basics
– NiFi use cases
– NiFi demos
HDF 2.0: Streaming Analytics
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Simplistic View of Enterprise Data Flow
Data Flow
Process and Analyze
Data
Acquire Data
Store Data
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interacting with different business partners and customers
Realistic View of Enterprise Data Flow
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
• For agile and immediate creation, configuration, control of dataflowsVisual Command and Control
• Ensures trust of your dataData Lineage (Provenance)
• Because not all data is of equal importanceData Prioritization
• Since not all senders/receivers/connections work perfectly all the timeData Buffering/Back-Pressure
• Adapt to different situations with different requirementsControl Latency vs Throughput
• Security of data, and data accessSecure Control Plane/Data Plane
• ScalabilityScale out Clustering
• Ecosystem flexibility and growthExtensibility
Apache NiFi: Designed for 8 challenges of global enterprise dataflow
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Apache NiFi used for?
• Reliable and secure transfer of data between systems
• Delivery of data from sources to analytic platforms
• Enrichment and preparation of data:
– Conversion between formats
– Extraction/Parsing
– Routing decisions
What is Apache NiFi NOT used for?
• Distributed Computation
• Complex Event Processing
• Joins / Complex Rolling Window Operations
Use Cases for Apache NiFi
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
FlowFile
• Unit of data moving through the system
• Content + Attributes (key/value pairs)
Processor
• Performs the work, can access FlowFiles
Connection
• Links between processors
• Queues that can be dynamically prioritized
Terminology
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HTTP Data FlowFile
HTTP/1.1 200 OK
Date: Sun, 10 Oct 2010 23:26:07 GMT
Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g
Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT
Content-Type: text/html
Hello world XXXXXXXXXXXXXXXXXXXXXXXXXXXX
Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'
Key: 'fileSize’ Value: '23609'
Key: 'filename’ Value: '15650246997242'
Key: 'path’ Value: './’
0101010101110101010101010101 (Binary)
Header
Content
Analogy: FlowFiles are like HTTP Data
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
1. Drag and drop processors to build a flow
2. Start, stop, and configure components in real time
3. View errors and corresponding error messages
4. View statistics and health of data flow
5. Create templates of common processor & connections
Create, Run, View, Start, Stop, Change, Fix, Dataflows in Real-Time
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi Demo: Tail Logs, Route on Content, Buffer in Kafka,
Deliver to HDFS
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Data Provenance and Why is it Important?
BEGIN
END
LINEAGE
IT and Cloud Operators
• Understand traceability, lineage
• Enable recovery and replay
Compliance Regulations
• Provide an audit trail
• Remediation capabilities
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Provenance Enables Easy Access and Traceability of Changes
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Need Fine-Grained Security and Compliance?
Security
• Secured authentication
• Enterprise authorization services –
entitlements change often
• Encrypted content, encrypted
communications
• People and systems with different roles
require difference access levels
• Tagged/classified data
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Repositories - Pass by reference
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Repositories – Copy on Write
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda HDF 2.0 Flow Management
HDF 2.0 Platform Evolution
– Product offering
– Example use case
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
 Constrained
 High-latency
 Localized context
 Hybrid – cloud / on-premises
 Low-latency
 Global context
Core
Infrastructure
Hortonworks DataFlow Manages Data in Motion
Regional
InfrastructureSources
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
DataFlow Management and Stream Processing
Core
InfrastructureSources
 Constrained
 High-latency
 Localized context
 Hybrid – cloud / on-premises
 Low-latency
 Global context
Regional
Infrastructure
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Edge Intelligence with Apache MiNiFi
 Guaranteed delivery
 Data buffering
‒ Backpressure
‒ Pressure release
 Prioritized queuing
 Flow specific QoS
‒ Latency vs. throughput
‒ Loss tolerance
 Data provenance
 Recovery / recording a rolling log
of fine-grained history
 Designed for extension
Different from Apache NiFi
 Design and Deploy
 Warm re-deploys
Key Features
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi vs. MiNiFi Java Agent
NiFi Framework
Components
MiNiFi
NiFi Framework
User Interface
Components
NiFi
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Company X provides alerting services when users’ resting heart rate higher
than a threshold
Real-Time Insights Require DataFlow Mgmt and Stream Processing
Acquire
Data
Company X Cloud
Instance 1
Acquire
Data
Company X Cloud
Instance 2
Acquire
Data
Company X Cloud
Instance 3
Acquire Data
Across Cloud
Instances
Parse, Filter,
Validate, Enrich
and Route
Core Data Center
Analytics/Pattern
Match
Data
Store
Alerts
Dashboards/Visualization
Flow Management Stream ProcessingLegend:
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data in Motion Needs Dataflow Management and Stream Processing
 Acquire data from various Wearable Device’s Cloud Instances
 Move Data from Customer Cloud Instances to on-premise instance
 Perform Intelligent Routing & Filtering of data. The routing and filtering rules will be often
changed at run-time.
 Deliver the data data to various downstream systems. New downstream apps should will always
appear and the data should be fed to it when it comes online.
 Parse the device data to standardized format that downstream sysem can understand
 Enrich the data with contextual information including patient/customer info (age, sex, etc..)
 Recognize the Pattern when the resting heart rate exceeds a certain threshold (the insight),
and then create an alert/notification.
 Run a Outlier detection model on streaming heart rate that comes in. If the score is above
certain threshold, alert on the heart rate.
Flow
Management
(NiFi, MiNiFi)
Stream
Processing
(Storm, Kafka)
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Cases for Data in Motion
Use Cases for Data-in-Motion Using DataFlow Mgmt
• Data Ingestion
• Edge Intelligence
• First Mile Problem
• Physical Data Movement
• Simple event processing such as Route, Filter, Enrich,
Transform, etc.
When Only DataFlow
Management is
Required
Use Cases for Data-in-Motion Using DataFlow Mgmt and
Steam Processing
• Flow Management to deliver data for Stream Processing
• PLUS: Complex pattern matching on unbounded streams of
data.
When Both DataFlow
Management and
Stream Processing
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Flow management
D A T A I N M O T I O N D A T A A T R E S T
IoT Data Sources AWS
Azure
Google Cloud
Hadoop
NiFi
Kafka
Storm
Others…
NiFi
NiFi NiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
NiFi
HDF 2.0: Data-in-Motion Platform
Enterprise Services
Ambari Ranger Other services
Flow management + Stream Processing
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
New Stream Processing Features HDF 2.0
 New Storm Connectors
 Storm-Kafka Spout using new
client APIs
 Storm Distributed Log Search
 Storm Dynamic Worker
Profiling
 Kafka Grafana Integration
 Storm Grafana Integration
 Improved Nimbus HA
 Storm Automatic Back
Pressure
 Storm Distributed cache
 Storm Windowing and State
Management
 Storm Performance
improvements
 Improved Kafka SASL
 Storm Topology Event inspector
 Storm Resource Aware
Scheduling
 Storm Dynamic Log Levels
 Pacemaker Storm Daemon
 Kafka Rack Awareness
Developer Productivity EnterpriseReadiness Operational Simplicity
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
For More Info: https://community.hortonworks.com/
Hortonworks Community Connection:
Data Ingestion and Streaming

More Related Content

What's hot (20)

PPTX
Hive present-and-feature-shanghai
Yifeng Jiang
 
PPTX
Introduction to Apache NiFi - Seattle Scalability Meetup
Saptak Sen
 
PDF
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
PPTX
Securing Hadoop with Apache Ranger
DataWorks Summit
 
PPTX
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
PPTX
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks
 
PDF
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks
 
PPTX
Apache NiFi 1.0 in Nutshell
DataWorks Summit/Hadoop Summit
 
PPTX
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Hortonworks
 
PPTX
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Hortonworks
 
PPTX
Apache Hive 2.0; SQL, Speed, Scale
Hortonworks
 
PDF
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
 
PDF
What's New in Apache Hive 3.0?
DataWorks Summit
 
PPTX
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
PPTX
Intro to Spark with Zeppelin
Hortonworks
 
PDF
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 
PDF
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
PDF
Hortonworks tech workshop in-memory processing with spark
Hortonworks
 
PDF
Attunity Hortonworks Webinar- Sept 22, 2016
Hortonworks
 
PPTX
ODPi 101: Who we are, What we do
Hortonworks
 
Hive present-and-feature-shanghai
Yifeng Jiang
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Saptak Sen
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Securing Hadoop with Apache Ranger
DataWorks Summit
 
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks
 
Apache NiFi 1.0 in Nutshell
DataWorks Summit/Hadoop Summit
 
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Hortonworks
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Hortonworks
 
Apache Hive 2.0; SQL, Speed, Scale
Hortonworks
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
 
What's New in Apache Hive 3.0?
DataWorks Summit
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
Intro to Spark with Zeppelin
Hortonworks
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks
 
Attunity Hortonworks Webinar- Sept 22, 2016
Hortonworks
 
ODPi 101: Who we are, What we do
Hortonworks
 

Similar to Hortonworks Data in Motion Webinar Series - Part 1 (20)

PPTX
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
PPTX
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
PPTX
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA
 
PDF
Apache Nifi Crash Course
DataWorks Summit
 
PDF
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Bryan Bende
 
PPTX
Apache NiFi Toronto Meetup
Hortonworks
 
PDF
Apache Nifi Crash Course
DataWorks Summit
 
PDF
Apache NiFi - Flow Based Programming Meetup
Joseph Witt
 
PPTX
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
PDF
Enterprise IIoT Edge Processing with Apache NiFi
Timothy Spann
 
PPTX
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
PPTX
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks
 
PDF
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
PPTX
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
PPTX
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Aldrin Piri
 
PPTX
Apache NiFi in the Hadoop Ecosystem
Bryan Bende
 
PPTX
Apache NiFi Crash Course - San Jose Hadoop Summit
Aldrin Piri
 
PDF
Nifi workshop
Yifeng Jiang
 
PDF
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
PDF
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Bryan Bende
 
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA
 
Apache Nifi Crash Course
DataWorks Summit
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Bryan Bende
 
Apache NiFi Toronto Meetup
Hortonworks
 
Apache Nifi Crash Course
DataWorks Summit
 
Apache NiFi - Flow Based Programming Meetup
Joseph Witt
 
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
Enterprise IIoT Edge Processing with Apache NiFi
Timothy Spann
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks
 
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Aldrin Piri
 
Apache NiFi in the Hadoop Ecosystem
Bryan Bende
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Aldrin Piri
 
Nifi workshop
Yifeng Jiang
 
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Bryan Bende
 
Ad

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
PDF
HDF 3.2 - What's New
Hortonworks
 
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
PDF
Premier Inside-Out: Apache Druid
Hortonworks
 
PDF
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
PDF
Making Enterprise Big Data Small with Ease
Hortonworks
 
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
PDF
Driving Digital Transformation Through Global Data Management
Hortonworks
 
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 
Ad

Recently uploaded (20)

PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
visibel.ai Company Profile – Real-Time AI Solution for CCTV
visibelaiproject
 
PPTX
python advanced data structure dictionary with examples python advanced data ...
sprasanna11
 
PDF
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
PPTX
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
PDF
Productivity Management Software | Workstatus
Lovely Baghel
 
PPTX
Lecture 5 - Agentic AI and model context protocol.pptx
Dr. LAM Yat-fai (林日辉)
 
PDF
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PDF
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
Fwdays
 
PDF
Generative AI in Healthcare: Benefits, Use Cases & Challenges
Lily Clark
 
PDF
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
PDF
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
PDF
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
PDF
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
PDF
How a Code Plagiarism Checker Protects Originality in Programming
Code Quiry
 
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
visibel.ai Company Profile – Real-Time AI Solution for CCTV
visibelaiproject
 
python advanced data structure dictionary with examples python advanced data ...
sprasanna11
 
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
Productivity Management Software | Workstatus
Lovely Baghel
 
Lecture 5 - Agentic AI and model context protocol.pptx
Dr. LAM Yat-fai (林日辉)
 
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
Fwdays
 
Generative AI in Healthcare: Benefits, Use Cases & Challenges
Lily Clark
 
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
How a Code Plagiarism Checker Protects Originality in Programming
Code Quiry
 
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 

Hortonworks Data in Motion Webinar Series - Part 1

  • 1. Harnessing Data-in-Motion with Hortonworks DataFlow Introduction to HDF 2.0 Haimo Liu Product Manager Aldrin Piri Technical Staff
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda HDF 2.0: Flow Management – NiFi basics – NiFi use cases – NiFi demos HDF 2.0: Streaming Analytics
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Simplistic View of Enterprise Data Flow Data Flow Process and Analyze Data Acquire Data Store Data
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interacting with different business partners and customers Realistic View of Enterprise Data Flow
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved • For agile and immediate creation, configuration, control of dataflowsVisual Command and Control • Ensures trust of your dataData Lineage (Provenance) • Because not all data is of equal importanceData Prioritization • Since not all senders/receivers/connections work perfectly all the timeData Buffering/Back-Pressure • Adapt to different situations with different requirementsControl Latency vs Throughput • Security of data, and data accessSecure Control Plane/Data Plane • ScalabilityScale out Clustering • Ecosystem flexibility and growthExtensibility Apache NiFi: Designed for 8 challenges of global enterprise dataflow
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Apache NiFi used for? • Reliable and secure transfer of data between systems • Delivery of data from sources to analytic platforms • Enrichment and preparation of data: – Conversion between formats – Extraction/Parsing – Routing decisions What is Apache NiFi NOT used for? • Distributed Computation • Complex Event Processing • Joins / Complex Rolling Window Operations Use Cases for Apache NiFi
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved FlowFile • Unit of data moving through the system • Content + Attributes (key/value pairs) Processor • Performs the work, can access FlowFiles Connection • Links between processors • Queues that can be dynamically prioritized Terminology
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HTTP Data FlowFile HTTP/1.1 200 OK Date: Sun, 10 Oct 2010 23:26:07 GMT Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT Content-Type: text/html Hello world XXXXXXXXXXXXXXXXXXXXXXXXXXXX Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'fileSize’ Value: '23609' Key: 'filename’ Value: '15650246997242' Key: 'path’ Value: './’ 0101010101110101010101010101 (Binary) Header Content Analogy: FlowFiles are like HTTP Data
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 1. Drag and drop processors to build a flow 2. Start, stop, and configure components in real time 3. View errors and corresponding error messages 4. View statistics and health of data flow 5. Create templates of common processor & connections Create, Run, View, Start, Stop, Change, Fix, Dataflows in Real-Time
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi Demo: Tail Logs, Route on Content, Buffer in Kafka, Deliver to HDFS
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Data Provenance and Why is it Important? BEGIN END LINEAGE IT and Cloud Operators • Understand traceability, lineage • Enable recovery and replay Compliance Regulations • Provide an audit trail • Remediation capabilities
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Provenance Enables Easy Access and Traceability of Changes
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Need Fine-Grained Security and Compliance? Security • Secured authentication • Enterprise authorization services – entitlements change often • Encrypted content, encrypted communications • People and systems with different roles require difference access levels • Tagged/classified data
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Repositories - Pass by reference
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Repositories – Copy on Write
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda HDF 2.0 Flow Management HDF 2.0 Platform Evolution – Product offering – Example use case
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved  Constrained  High-latency  Localized context  Hybrid – cloud / on-premises  Low-latency  Global context Core Infrastructure Hortonworks DataFlow Manages Data in Motion Regional InfrastructureSources
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DataFlow Management and Stream Processing Core InfrastructureSources  Constrained  High-latency  Localized context  Hybrid – cloud / on-premises  Low-latency  Global context Regional Infrastructure
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Edge Intelligence with Apache MiNiFi  Guaranteed delivery  Data buffering ‒ Backpressure ‒ Pressure release  Prioritized queuing  Flow specific QoS ‒ Latency vs. throughput ‒ Loss tolerance  Data provenance  Recovery / recording a rolling log of fine-grained history  Designed for extension Different from Apache NiFi  Design and Deploy  Warm re-deploys Key Features
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi vs. MiNiFi Java Agent NiFi Framework Components MiNiFi NiFi Framework User Interface Components NiFi
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example: Company X provides alerting services when users’ resting heart rate higher than a threshold Real-Time Insights Require DataFlow Mgmt and Stream Processing Acquire Data Company X Cloud Instance 1 Acquire Data Company X Cloud Instance 2 Acquire Data Company X Cloud Instance 3 Acquire Data Across Cloud Instances Parse, Filter, Validate, Enrich and Route Core Data Center Analytics/Pattern Match Data Store Alerts Dashboards/Visualization Flow Management Stream ProcessingLegend:
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data in Motion Needs Dataflow Management and Stream Processing  Acquire data from various Wearable Device’s Cloud Instances  Move Data from Customer Cloud Instances to on-premise instance  Perform Intelligent Routing & Filtering of data. The routing and filtering rules will be often changed at run-time.  Deliver the data data to various downstream systems. New downstream apps should will always appear and the data should be fed to it when it comes online.  Parse the device data to standardized format that downstream sysem can understand  Enrich the data with contextual information including patient/customer info (age, sex, etc..)  Recognize the Pattern when the resting heart rate exceeds a certain threshold (the insight), and then create an alert/notification.  Run a Outlier detection model on streaming heart rate that comes in. If the score is above certain threshold, alert on the heart rate. Flow Management (NiFi, MiNiFi) Stream Processing (Storm, Kafka)
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Cases for Data in Motion Use Cases for Data-in-Motion Using DataFlow Mgmt • Data Ingestion • Edge Intelligence • First Mile Problem • Physical Data Movement • Simple event processing such as Route, Filter, Enrich, Transform, etc. When Only DataFlow Management is Required Use Cases for Data-in-Motion Using DataFlow Mgmt and Steam Processing • Flow Management to deliver data for Stream Processing • PLUS: Complex pattern matching on unbounded streams of data. When Both DataFlow Management and Stream Processing
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Flow management D A T A I N M O T I O N D A T A A T R E S T IoT Data Sources AWS Azure Google Cloud Hadoop NiFi Kafka Storm Others… NiFi NiFi NiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi NiFi HDF 2.0: Data-in-Motion Platform Enterprise Services Ambari Ranger Other services Flow management + Stream Processing
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved New Stream Processing Features HDF 2.0  New Storm Connectors  Storm-Kafka Spout using new client APIs  Storm Distributed Log Search  Storm Dynamic Worker Profiling  Kafka Grafana Integration  Storm Grafana Integration  Improved Nimbus HA  Storm Automatic Back Pressure  Storm Distributed cache  Storm Windowing and State Management  Storm Performance improvements  Improved Kafka SASL  Storm Topology Event inspector  Storm Resource Aware Scheduling  Storm Dynamic Log Levels  Pacemaker Storm Daemon  Kafka Rack Awareness Developer Productivity EnterpriseReadiness Operational Simplicity
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved For More Info: https://community.hortonworks.com/ Hortonworks Community Connection: Data Ingestion and Streaming

Editor's Notes

  • #18: Hortonworks: Powering the Future of Data
  • #19: Hortonworks: Powering the Future of Data
  • #21: 20
  • #26: 25