SlideShare a Scribd company logo
GuideTo New Features of
Hortonworks DataFlow 2.0
Haimo Liu
Product Manager
Bryan Bende
Sr. Software Engineer
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connected Data Platforms
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Processing
Flow Management
Enterprise Services
At the edge
Security
Visualization
On premises In the cloud
Registries/Catalogs Governance (Security/Compliance) Operations
HDF 2.0 – Data in Motion Platform
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Flow Management Flow management + Stream Processing
D A T A I N M O T I O N D A T A A T R E S T
IoT Data Sources AWS
Azure
Google Cloud
Hadoop
NiFi
Kafka
Storm
Others…
NiFi
NiFi NiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
NiFi
HDF 2.0 – Data in Motion Platform
Enterprise Services
Ambari Ranger Other services
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dataflow Management
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Problems Today: Timely Access to Data and Decisions
http://diginomica.com/2016/04/22/royal-mail-starts-to-deliver-on-hortonworks-data-in-motion-promise
“HDF helps us to streamline the flow
of data and build models and
visualisations quickly, so that my team
can work iteratively with business
colleagues on building solutions
that work for the business.“
Royal Mail
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDP
HORTONWORKS
DATA PLATFORM
Powered by Apache Hadoop
HDF Makes Big Data Ingest Easy
Complicated, messy, and takes weeks to
months to move the right data into Hadoop
Streamlined, Efficient, Easy
HDP
HORTONWORKS
DATA PLATFORM
Powered by Apache Hadoop
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Create a live dataflow in minutes
How would that change your business?
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Add processor for data intake. Time: 1 minute
1 Drag and drop processor from top menu
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Choose the specific processor
2 Choose one of the processors – currently 170+ available
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Pick Twitter Processor
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Configure the processor. Time: 2 minutes
3
4
Select processor and choose
option to Configure
Adjust
parameters as
required
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Another processor for data output. Time: 1 minute
5
6 Filter for and select a “Put” processor
Drag and drop processor from top menu
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Configure second processor. Time: 1 minute
7 Configure 2nd processor
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connect processors, configure connection. 2 minutes
Configure Connection8
Note: Sample Flow is different from previous example of PutHDFS. This dataflow is PutFile. Same concepts apply.
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Click Start to Begin Processing. Time total: 7 minutes
9 Click start “play” to being processing
(will run continuously until you select stop)
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF 2.0: what’s new?
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges
Different devices
Globally distributed organization
Intelligence on the edge
Time to delivery
Getting the right data to
the right place at the
right time is not trivial!
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi
Different devices: different standards/protocols/formats
• Out of the box processors
• Intuitive GUI to combine processors and build ingestion pipeline
• Extensible framework, extremely easy to add a new source/protocol
Globally distributed organizations
Intelligence on the edge
Time to delivery
Support disparate,
distributed systems
with easy drag & drop
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices: different standards/protocols/formats
• Out of the box processors
• Intuitive GUI to combine processors and build ingestion pipeline
• Extensible framework, extremely easy to add a new source/protocol
• Deeper ecosystem integration, 170+ processors in total
Globally distributed organizations
Intelligence on the edge
Time to delivery Expanded ecosystem
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF 2.0 has 170+ Processors, 30% Increase from HDF 1.2
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
HL7
FTP
UDP
XML
SFTP
HTTP
Syslog
Email
HTML
Image
AMQP
MQTT
All Apache project logos are trademarks of the ASF and the respective projects.
Fetch
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deeper Ecosystem Integration – New Processors
Processor Description
Publish/ConsumeKafka Two NARs, with kafka 0.9/0.10 client libraries, respectively
JoltTransformJson Manipulate JSON data on the fly, with a preview functionality
GenerateTableFetch Incremental fetch + parallel fetch against source table partitions
PutHiveQL Ingest to Hive tables
SelectHiveQL Select from Hive tables
PutHiveStreaming ingest streaming data to Hive, leverage Hive streaming API
CovertAvroToORC Format conversation, Avro to ORC
Publish/ConsumeMQTT MQTT is a popular protocol in IoT world
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices: different standards/protocols/formats
• Out of the box processors
• Intuitive GUI to combine processors and build ingestion pipeline
• Extensible framework, extremely easy to add a new source/protocol
• Deeper ecosystem integration, 170+ processors in total
• Redesigned UI, refreshed user experience
Globally distributed organizations
Intelligence on the edge
Time to delivery
More intuitive user
interface
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Modernized UI – Complete Interface Redesign
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi
Different devices
Globally distributed organizations: dataflow across multiple data centers
• Internal Site to Site communication, secured by 2-way SSL
• Environmental neutral
Intelligence on the edge
Time to delivery Secure communications
across disparate,
distributed systems
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices
Globally distributed organizations: dataflow across multiple data centers
• Internal Site to Site communication, secured by 2-way SSL
• Environmental neutral
• Variable registry
Intelligence on the edge
Time to delivery
Simplifies flow
provisioning
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Variable Registry
 Variable registry
– To automatically resolve environmental specific values
• Example: connection string
• The same key referenced in a template, can be mapped to different values
in DEV vs PROD
– In-memory variable registry
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices
Globally distributed organizations: dataflow across multiple data centers
• Internal Site-to-Site communication, secured by 2-way SSL
• Environmental neutral
• Variable registry
• Better deployment management, Apache Ambari integration
Intelligence on the edge
Time to delivery Simplified operations in
distributed environments
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Integration
 NiFi cluster management
– Start/stop NiFi service
– Centralized place for managing config files
 Ambari to display NiFi metrics
 Ambari to manage kerberos
authentication
Ambari-NiFi Integration
 Automated deployment by Ambari
 Manual RPM deployment
 Tar.gz/zip deployment (NIFI/MINIFI Java)
 Tar.gz for most Linux/Mac, compile your own
for other OS (MINIFI C++)
HDF 2.0 Deployment Model
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices
Globally distributed organizations: dataflow across multiple data centers
• Internal Site to Site communication, secured by 2-way SSL
• Environmental neutral
• Variable registry
• Better deployment management, Apache Ambari integration
• Enhanced Site to Site communication
Intelligence on the edge
Time to delivery
Modularized s2s to support
pluggable protocols
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi
Different devices, Globally distributed organizations
Intelligence on the edge: analytics on resource constrained devices
• Run single node on the edge, communicating back via S2S
• Bi-directional communication
Time to delivery
Analytics at the Edge
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices, Globally distributed organizations
Intelligence on the edge: analytics on resource constrained devices
• Run single node on the edge, communicating back via Site to Site protocol
• Bi-directional communication
• Apache MiNiFi, bi-directional command and control on the edge
Time to delivery
Edge Intelligence
for the
first mile
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Edge Intelligence with Apache MiNiFi
 Guaranteed delivery
 Data buffering
‒ Backpressure
‒ Pressure release
 Prioritized queuing
 Flow specific QoS
‒ Latency vs. throughput
‒ Loss tolerance
 Data provenance
 Recovery / recording a rolling log
of fine-grained history
 Designed for extension
Different from Apache NiFi
 Design and Deploy
 Warm re-deploys
Key Features
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi vs. MiNiFi Java Agent
NiFi Framework
Components
MiNiFi
NiFi Framework
User Interface
Components
NiFi
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi
Different devices, Globally distributed organizations, Intelligence on the edge
Time to delivery: need an application, out of the box solution
• Data provenance, traceability and compliance issues
• Flow visibility, big picture of the enterprise dataflow
• Automatic failure handling
FAST AND EASY
To get results, tune and
change dataflows
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices, Globally distributed organizations, Intelligence on the edge
Time to delivery: need an application, out of the box solution
• Data provenance, traceability and compliance issues
• Flow visibility, big picture of the enterprise dataflow
• Automatic failure handling
• Control plane high availability, zero-master clustering
High availability
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zero-master Clustering
 New clustering paradigm
 Zero-master clustering
– Multiple entry points, no master node, no single point of failure
– Auto-elected cluster coordinator for cluster maintenance
– Automatic failover handling
HDF 2.0 (NiFi 1.0.0)
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zero-master Clustering
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zero-master Clustering
Heartbeat messages (every 5s by default)
Node status: connecting/connected/disconnecting/disconnected
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zero-master Clustering
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices, Globally distributed organizations, Intelligence on the edge
Time to delivery: need an application, out of the box solution
• Data provenance, traceability and compliance issues
• Flow visibility, big picture of the enterprise dataflow
• Automatic failure handling
• Control plane high availability, zero-master clustering
• Multi-tenancy flow editing, and authorization
Secured enterprise wide
collaboration
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-tenant Flow Editing
 Multi-tenant flow editing
– Self-service collaborative model, google-doc type user experience
– Multiple teams making edits to different processors at the same time
– Only the component being modified is locked, not the entire flow
– Scalable model to speed up flow editing
HDF 2.0 (NiFi 1.0.0)
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-tenant Authorization
 Component level authorization
– New authorizer API
– “Read” and “Write” permissions
– Protection against unauthorized usage without losing context
 Authorization management
– Internal management (NIFI)
– External management (Ranger, etc.)
HDF 2.0 (NiFi 1.0.0)
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-tenant Authorization
Read Permission
Processor name
visible
Processor configuration
visible
45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-tenant Authorization
NO Read Permission
Processor name & configuration invisible
(content)
Statistics visible
(context)
46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions?
Hortonworks Community Connection:
Data Ingestion and Streaming
https://community.hortonworks.com/
Ad

More Related Content

What's hot (20)

Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
DataWorks Summit/Hadoop Summit
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with Zeppelin
Hortonworks
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
Hortonworks
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
Hortonworks
 
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsDelivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Hortonworks
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
DataWorks Summit
 
Keynote
KeynoteKeynote
Keynote
DataWorks Summit
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
Yifeng Jiang
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
Bryan Bende
 
Apache Ambari - What's New in 2.2
 Apache Ambari - What's New in 2.2 Apache Ambari - What's New in 2.2
Apache Ambari - What's New in 2.2
Hortonworks
 
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4
DataWorks Summit
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with Zeppelin
Hortonworks
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
Hortonworks
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
Hortonworks
 
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsDelivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Hortonworks
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
DataWorks Summit
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
Yifeng Jiang
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
Bryan Bende
 
Apache Ambari - What's New in 2.2
 Apache Ambari - What's New in 2.2 Apache Ambari - What's New in 2.2
Apache Ambari - What's New in 2.2
Hortonworks
 
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4
DataWorks Summit
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
 

Viewers also liked (20)

Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
Isheeta Sanghi
 
Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4
Hortonworks
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
Hortonworks
 
Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS
Hortonworks
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
Hortonworks
 
Hive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
Hortonworks
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
Hortonworks
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
Manish Gupta
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
DataWorks Summit/Hadoop Summit
 
How Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHow Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform Education
Hortonworks
 
Getting involved with Open Source at the ASF
Getting involved with Open Source at the ASFGetting involved with Open Source at the ASF
Getting involved with Open Source at the ASF
Hortonworks
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambari
Hortonworks
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Hortonworks
 
Top 5 Strategies for Retail Data Analytics
Top 5 Strategies for Retail Data AnalyticsTop 5 Strategies for Retail Data Analytics
Top 5 Strategies for Retail Data Analytics
Hortonworks
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial Services
Hortonworks
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
Hortonworks
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution
Hortonworks
 
Apache Hadoop 0.23
Apache Hadoop 0.23Apache Hadoop 0.23
Apache Hadoop 0.23
Hortonworks
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
Isheeta Sanghi
 
Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4
Hortonworks
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
Hortonworks
 
Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS
Hortonworks
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
Hortonworks
 
Hive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
Hortonworks
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
Hortonworks
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
Manish Gupta
 
How Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHow Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform Education
Hortonworks
 
Getting involved with Open Source at the ASF
Getting involved with Open Source at the ASFGetting involved with Open Source at the ASF
Getting involved with Open Source at the ASF
Hortonworks
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambari
Hortonworks
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Hortonworks
 
Top 5 Strategies for Retail Data Analytics
Top 5 Strategies for Retail Data AnalyticsTop 5 Strategies for Retail Data Analytics
Top 5 Strategies for Retail Data Analytics
Hortonworks
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial Services
Hortonworks
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
Hortonworks
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution
Hortonworks
 
Apache Hadoop 0.23
Apache Hadoop 0.23Apache Hadoop 0.23
Apache Hadoop 0.23
Hortonworks
 
Ad

Similar to Webinar Series Part 5 New Features of HDF 5 (20)

Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Haimo Liu
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Bryan Bende
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Aldrin Piri
 
Druid Scaling Realtime Analytics
Druid Scaling Realtime AnalyticsDruid Scaling Realtime Analytics
Druid Scaling Realtime Analytics
Aaron Brooks
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
Aldrin Piri
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
DataWorks Summit
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
Timothy Spann
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
Shivaji Dutta
 
Building a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and SparkBuilding a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and Spark
DataWorks Summit/Hadoop Summit
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming Meetup
Joseph Witt
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Haimo Liu
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Bryan Bende
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Aldrin Piri
 
Druid Scaling Realtime Analytics
Druid Scaling Realtime AnalyticsDruid Scaling Realtime Analytics
Druid Scaling Realtime Analytics
Aaron Brooks
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
Aldrin Piri
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
DataWorks Summit
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
Timothy Spann
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming Meetup
Joseph Witt
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
Ad

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 

Recently uploaded (20)

Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 

Webinar Series Part 5 New Features of HDF 5

  • 1. GuideTo New Features of Hortonworks DataFlow 2.0 Haimo Liu Product Manager Bryan Bende Sr. Software Engineer
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connected Data Platforms
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stream Processing Flow Management Enterprise Services At the edge Security Visualization On premises In the cloud Registries/Catalogs Governance (Security/Compliance) Operations HDF 2.0 – Data in Motion Platform
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Flow Management Flow management + Stream Processing D A T A I N M O T I O N D A T A A T R E S T IoT Data Sources AWS Azure Google Cloud Hadoop NiFi Kafka Storm Others… NiFi NiFi NiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi NiFi HDF 2.0 – Data in Motion Platform Enterprise Services Ambari Ranger Other services
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dataflow Management
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Problems Today: Timely Access to Data and Decisions http://diginomica.com/2016/04/22/royal-mail-starts-to-deliver-on-hortonworks-data-in-motion-promise “HDF helps us to streamline the flow of data and build models and visualisations quickly, so that my team can work iteratively with business colleagues on building solutions that work for the business.“ Royal Mail
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDP HORTONWORKS DATA PLATFORM Powered by Apache Hadoop HDF Makes Big Data Ingest Easy Complicated, messy, and takes weeks to months to move the right data into Hadoop Streamlined, Efficient, Easy HDP HORTONWORKS DATA PLATFORM Powered by Apache Hadoop
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Create a live dataflow in minutes How would that change your business?
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Add processor for data intake. Time: 1 minute 1 Drag and drop processor from top menu
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Choose the specific processor 2 Choose one of the processors – currently 170+ available
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example: Pick Twitter Processor
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Configure the processor. Time: 2 minutes 3 4 Select processor and choose option to Configure Adjust parameters as required
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Another processor for data output. Time: 1 minute 5 6 Filter for and select a “Put” processor Drag and drop processor from top menu
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Configure second processor. Time: 1 minute 7 Configure 2nd processor
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connect processors, configure connection. 2 minutes Configure Connection8 Note: Sample Flow is different from previous example of PutHDFS. This dataflow is PutFile. Same concepts apply.
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Click Start to Begin Processing. Time total: 7 minutes 9 Click start “play” to being processing (will run continuously until you select stop)
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF 2.0: what’s new?
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges Different devices Globally distributed organization Intelligence on the edge Time to delivery Getting the right data to the right place at the right time is not trivial!
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi Different devices: different standards/protocols/formats • Out of the box processors • Intuitive GUI to combine processors and build ingestion pipeline • Extensible framework, extremely easy to add a new source/protocol Globally distributed organizations Intelligence on the edge Time to delivery Support disparate, distributed systems with easy drag & drop
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices: different standards/protocols/formats • Out of the box processors • Intuitive GUI to combine processors and build ingestion pipeline • Extensible framework, extremely easy to add a new source/protocol • Deeper ecosystem integration, 170+ processors in total Globally distributed organizations Intelligence on the edge Time to delivery Expanded ecosystem
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF 2.0 has 170+ Processors, 30% Increase from HDF 1.2 Hash Extract Merge Duplicate Scan GeoEnrich Replace ConvertSplit Translate Route Content Route Context Route Text Control Rate Distribute Load Generate Table Fetch Jolt Transform JSON Prioritized Delivery Encrypt Tail Evaluate Execute HL7 FTP UDP XML SFTP HTTP Syslog Email HTML Image AMQP MQTT All Apache project logos are trademarks of the ASF and the respective projects. Fetch
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deeper Ecosystem Integration – New Processors Processor Description Publish/ConsumeKafka Two NARs, with kafka 0.9/0.10 client libraries, respectively JoltTransformJson Manipulate JSON data on the fly, with a preview functionality GenerateTableFetch Incremental fetch + parallel fetch against source table partitions PutHiveQL Ingest to Hive tables SelectHiveQL Select from Hive tables PutHiveStreaming ingest streaming data to Hive, leverage Hive streaming API CovertAvroToORC Format conversation, Avro to ORC Publish/ConsumeMQTT MQTT is a popular protocol in IoT world
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices: different standards/protocols/formats • Out of the box processors • Intuitive GUI to combine processors and build ingestion pipeline • Extensible framework, extremely easy to add a new source/protocol • Deeper ecosystem integration, 170+ processors in total • Redesigned UI, refreshed user experience Globally distributed organizations Intelligence on the edge Time to delivery More intuitive user interface
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Modernized UI – Complete Interface Redesign
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi Different devices Globally distributed organizations: dataflow across multiple data centers • Internal Site to Site communication, secured by 2-way SSL • Environmental neutral Intelligence on the edge Time to delivery Secure communications across disparate, distributed systems
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices Globally distributed organizations: dataflow across multiple data centers • Internal Site to Site communication, secured by 2-way SSL • Environmental neutral • Variable registry Intelligence on the edge Time to delivery Simplifies flow provisioning
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Variable Registry  Variable registry – To automatically resolve environmental specific values • Example: connection string • The same key referenced in a template, can be mapped to different values in DEV vs PROD – In-memory variable registry
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices Globally distributed organizations: dataflow across multiple data centers • Internal Site-to-Site communication, secured by 2-way SSL • Environmental neutral • Variable registry • Better deployment management, Apache Ambari integration Intelligence on the edge Time to delivery Simplified operations in distributed environments
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Integration  NiFi cluster management – Start/stop NiFi service – Centralized place for managing config files  Ambari to display NiFi metrics  Ambari to manage kerberos authentication Ambari-NiFi Integration  Automated deployment by Ambari  Manual RPM deployment  Tar.gz/zip deployment (NIFI/MINIFI Java)  Tar.gz for most Linux/Mac, compile your own for other OS (MINIFI C++) HDF 2.0 Deployment Model
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices Globally distributed organizations: dataflow across multiple data centers • Internal Site to Site communication, secured by 2-way SSL • Environmental neutral • Variable registry • Better deployment management, Apache Ambari integration • Enhanced Site to Site communication Intelligence on the edge Time to delivery Modularized s2s to support pluggable protocols
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi Different devices, Globally distributed organizations Intelligence on the edge: analytics on resource constrained devices • Run single node on the edge, communicating back via S2S • Bi-directional communication Time to delivery Analytics at the Edge
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices, Globally distributed organizations Intelligence on the edge: analytics on resource constrained devices • Run single node on the edge, communicating back via Site to Site protocol • Bi-directional communication • Apache MiNiFi, bi-directional command and control on the edge Time to delivery Edge Intelligence for the first mile
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Edge Intelligence with Apache MiNiFi  Guaranteed delivery  Data buffering ‒ Backpressure ‒ Pressure release  Prioritized queuing  Flow specific QoS ‒ Latency vs. throughput ‒ Loss tolerance  Data provenance  Recovery / recording a rolling log of fine-grained history  Designed for extension Different from Apache NiFi  Design and Deploy  Warm re-deploys Key Features
  • 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi vs. MiNiFi Java Agent NiFi Framework Components MiNiFi NiFi Framework User Interface Components NiFi
  • 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi Different devices, Globally distributed organizations, Intelligence on the edge Time to delivery: need an application, out of the box solution • Data provenance, traceability and compliance issues • Flow visibility, big picture of the enterprise dataflow • Automatic failure handling FAST AND EASY To get results, tune and change dataflows
  • 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices, Globally distributed organizations, Intelligence on the edge Time to delivery: need an application, out of the box solution • Data provenance, traceability and compliance issues • Flow visibility, big picture of the enterprise dataflow • Automatic failure handling • Control plane high availability, zero-master clustering High availability
  • 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zero-master Clustering  New clustering paradigm  Zero-master clustering – Multiple entry points, no master node, no single point of failure – Auto-elected cluster coordinator for cluster maintenance – Automatic failover handling HDF 2.0 (NiFi 1.0.0)
  • 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zero-master Clustering
  • 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zero-master Clustering Heartbeat messages (every 5s by default) Node status: connecting/connected/disconnecting/disconnected
  • 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zero-master Clustering
  • 41. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices, Globally distributed organizations, Intelligence on the edge Time to delivery: need an application, out of the box solution • Data provenance, traceability and compliance issues • Flow visibility, big picture of the enterprise dataflow • Automatic failure handling • Control plane high availability, zero-master clustering • Multi-tenancy flow editing, and authorization Secured enterprise wide collaboration
  • 42. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-tenant Flow Editing  Multi-tenant flow editing – Self-service collaborative model, google-doc type user experience – Multiple teams making edits to different processors at the same time – Only the component being modified is locked, not the entire flow – Scalable model to speed up flow editing HDF 2.0 (NiFi 1.0.0)
  • 43. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-tenant Authorization  Component level authorization – New authorizer API – “Read” and “Write” permissions – Protection against unauthorized usage without losing context  Authorization management – Internal management (NIFI) – External management (Ranger, etc.) HDF 2.0 (NiFi 1.0.0)
  • 44. 44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-tenant Authorization Read Permission Processor name visible Processor configuration visible
  • 45. 45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-tenant Authorization NO Read Permission Processor name & configuration invisible (content) Statistics visible (context)
  • 46. 46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions? Hortonworks Community Connection: Data Ingestion and Streaming https://community.hortonworks.com/

Editor's Notes

  • #3: Hortonworks: Powering the Future of Data
  • #7: Hortonworks: Powering the Future of Data
  • #8: 7
  • #22: Hortonworks: Powering the Future of Data
  • #35: 34