Hortonworks DataFlow White Paper
Hortonworks DataFlow White Paper
Contents
What is Hortonworks DataFlow? 3
• Data Collection
• Operational Efficiency
• Bi-directional DataFlow
About Hortonworks 12
3
A single integrated platform for data acquisition, simple event processing, transport and
delivery mechanism from source to storage.
Hortonworks DataFlow enables the real time collection Hortonworks Data Platform can be used to enrich
and processing of perishable insights. content and support changes to real-time dataflows.
ENRICH CONTENT
PERISHABLE HISTORICAL
INSIGHTS INSIGHTS
Hortonworks DataFlow is designed to securely collect and transport data from highly diverse data sources be they big or
small, fast or slow, always connected or intermittently available.
Leverage Operational Efficiency Make Better Business Decisions Increase Data Security
• Accelerate big data ROI via simplified • Make better business decisions with • Support unprecedented yet simple to
data collection and a visually intuitive highly granular data sharing policies implement data security from source to
dataflow management interface • Focus on innovation by automating storage
• Significantly reduce cost and dataflow routing, management and • Improve compliance and reduce risk
complexity of managing, maintaining trouble-shooting without the need through highly granular data access,
and evolving dataflows for coding data sharing and data usage policies
• Trace and verify value of data sources • Enable on-time, immediate decision • Create a secure dataflow ecosystem
for future investments making by leveraging real time data with the ability to run the same security
• Quickly adapt to new data sources bi-directional dataflows and encryption on small scale JVM
through an extremely scalable • Increase business agility with prioritized capable data sources as well as
Accelerate big data ROI through a single Reduce cost and complexity through an Unprecedented yet simple to implement
data-source agnostic collection platform intuitive, real-time visual user interface data security from source to storage
Better business decisions with highly React in real time by leveraging Adapt to new data sources through an
granular data sharing policies bi-directional data flows and prioritized extremely scalable, extensible platform
data feeds
5
UI
Dynamically Adjust DataFlow
UI
Processor Real Time Changes
Figure 3: Current big data ingest solutions are complex and operationally inefficient
8
Case 2: Increased Security and Unprecedented Chain of Custody WHAT IS DATA PROVENANCE?
Figure 4: Secure from source to storage with high fidelity data provenance
9
Hortonworks Dataflow, with its inherent ability to support fine grained provenance data and
metadata throughout the collection, transport and ingest process provides comprehensive and
detailed information needed for audit and remediation unmatched by any existing data ingest
system in place today.
At the same time, devices are producing more data than ever before. Much of the data being
produced is data-in-motion and unlocking the business value from this data is crucial to
business transformations of the modern economy.
Yet business transformation relies on accurate, secure access to data from the source through
to storage. Hortonworks DataFlow was designed with all these real-world constraints in mind:
power limitations, connectivity fluctuations, data security and traceability, data source diversity
and geographical distribution, altogether, for accurate, time-sensitive decision making.
Hortonworks DataFlow is able to run security and encryption on small scale, JVM-capable
data sources as well as enterprise class datacenters. This enables the Internet of Things
with a reliable, secure, common data collection and transport platform with a real-time
feedback loop to continually and immediately improve algorithms and analysis for accurate,
informed on-time decision making.
Hortonworks DataFlow enables the decision to be made at the edge of whether to send,
drop or locally store data, as needed, and as conditions change. Additionally, with a fine
grained command and control interface, data queues can be slowed down, or accelerated to
balance the demands of the situation at hand with the current availability and cost of resources.
With the ability to seamlessly adapt to resource constraints in real time, ensure secure data
collection and prioritized data transfer, Hortonworks DataFlow is a proven platform ideal for
the Internet of Things.
12
For an independent analysis of Hortonworks Data Platform and its leadership among Apache
Hadoop vendors, you can download the Forrester Wave™: Big Data Apache Hadoop Solutions,
Q1 2014 report from Forrester Research.
About Hortonworks
Hortonworks develops, distributes and supports the only 100% open source Apache Hadoop
data platform. Our team comprises the largest contingent of builders and architects within the
Apache Hadoop ecosystem who represent and lead the broader enterprise requirements
within these communities. Hortonworks Data Platform deeply integrates with existing IT
investments upon which enterprises can build and deploy Apache Hadoop-based applications.
Hortonworks has deep relationships with the key strategic data center partners that enable our
customers to unlock the broadest opportunities from Apache Hadoop.