SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager (SAM)
& Registry
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Registry
Streaming Analytics Manager (SAM)
Demo
Questions
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
History of Streaming at Hortonworks
 Introduced Storm as Stream Processing Engine in HDP-2.1 (Late 2013)
 First to ship Apache Kafka as Enterprise Messaging Queue ( Early 2014)
 Added several improvements & features into Apache Storm.
 Added Security and critical features/improvements to Apache Kafka
 Lot of learnings from shipping Storm & Kafka for past 3 years
 Vision & Implementation of Registry & Streaming Analytics Manager based on our learnings from shipping Storm
& Kafka for past 3 years.
Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Registry
Page5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Registry
 Foundational service to enable multiple use-cases including Streaming, Machine Learning,
Service discovery, Application templates
 Offers base frameworks to develop Schema Registry, ML Registry etc..
 Registry modules like Schema Registry, ML Registry build their own entities on top of
versioned entity
 Modular approach to running registry services.
 Users will have flexibility to choose what registry services they would like to enable.
 We have Schema Registry and ML Registry
Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Schema Registry? What Value Does it Provide?
 What is Schema Registry?
• A shared repository of schemas that allows applications to flexibly interact with each other
 What Value does Schema Registry Provide?
– Central Metadata Repository
• Provide reusable schema
• Define relationship between schemas
• Enable generic format conversion, and generic routing
– Operational Efficiency
• To avoid attaching schema to every piece of data
• Producers and consumers can evolve at different rates
 Example Use
– Register Schemas for Kafka Topics to be used by consumers of Kafka Topic (e.g: Nifi, StreamLine)
Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry Concepts
• Schema Group
A logical grouping/container
for similar type of schemas or
based any criteria that the
customer has from managing
the schemas
• Schema Metadata
Metadata associated with a
named schema.
• Schema Version
The actual versioned schema
associated a schema meta
definition
Schema Metadata 1
Schema Name
Schema Type
Description
Compatibility Policy
Serializers
Deserializers
Schema Group
Group Name
SchemaVersion 3
SchemaVersion 2
Schema Version 1
version
text
Fingerprint
Page8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sender/Receiver flow
Local
schema/serdes
cache
Serializer
Producer
Schema
Registry Client
Message Store
Local
schema/serdes
cache
Deserializer
Schema
Registry Client
version
payloa
d
version
payloa
d
Schema Storage SerDes Storage
Consumer
SchemaRegist
ry
SchemaRegist
ry
SchemaRegist
ry
Page9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry
Schema Registry Component Architecture
SR Web Server
Schema Registry
Web App
REST APISchema Registry Client
Java Client
Integrations
Nifi Processors Kafka Ser/Des StreamLine
Schema
Storage
Pluggable Storage
Serializer/Deserializer
Jar Storage
MySQL In-Memory Local
File
System
HDFSPostgre
s
Page11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Compatibility Policies
 What is a Compatibility Policy?
– Defines the rules of how the schemas can evolve
– Subsequent version updates has to honor the schema’s original compatibility.
 Policies Supported
– Backward
– Forward
– Both
– None
Page12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema evolution
Producer
v2
Consumer
v2
Producer
v1
Producer
v4
Consumer
v5
Producer
v1
Consumer
v7
Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Serializers/Deserializers
 Snapshot based serializer/deserializer
– Seriliazes the complete payload
– Deserializes the payload to respective type
 Pull based serializer/deserializer
– Serialize whatever elements are required and ignore other elements
– Pull out whatever elements that are required to build the desired object
 Push based deserializer
– Gives callback to receive parsing events for respective fields in schema
Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema registry client
 REST based client
 Caching
– Metadata
– Schema versions
– Ser/des libs and class loaders
 URL selectors
– Round robin
– Failover
Page19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HA
 Storage provider
– Depends on transactional support
of underlying SQL stores
– Spinup required schema registry
instances
 Supports HA at SchemaRegistry
– Using ZK/Curator
– Automatic failover of master
– Master gets all writes
– Slaves receives only reads
SchemaRegistr
y
storage
SchemaRegistr
y
SchemaRegistr
y
SchemaRegistr
y
SchemaRegistr
ySchemaRegistr
y
storage
Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Integration of Schema Registry
 Kafka
– Using producer/consumer API for serializer/deserializer
 Nifi Processors for Schema Registry
– Fetch Schema
– Serialize/Deserialize with Schema
 StreamLine processors for Schema Registry
– Lookup Schema of a Kafka, Kinesis, EventHubs Topic
– Lookup Schema of a HDFS Directory
Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry UI
Page25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
WIP/Future enhancements
 Security
– Kerberos support
– Default authorizers and Apache Ranger support
 Audit of Schemas & Clients
 Rich Types in Schema definition
 Pluggable Listeners
 Schema Policies
 Notifications
– New versions
– Archiving
 Converters
Page26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Try it out!
 Its open source under Apache License
 https://github.com/hortonworks/registry
 Apache incubation soon
 Registry 0.2 release April 25th, 0.3 release on May 31st
 https://groups.google.com/forum/#!forum/registry
 We are seeing outside contributions
 Contributions are welcome!
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager
Page28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager
 What is it?
• A platform used to design, develop, deploy and manage streaming analytics
applications using a drag drop visualize paradigm in minutes
• Supports event correlation, context enrichment , complex pattern matching,
analytical aggregations and alerts/notifications when insights are discovered.
• It is agnostic to the underlying streaming engine and can support multiple streaming
substrates (e.g: Storm, Spark Streaming, Flink)
• Extensibility is a first class citizen (add sinks, processors, sources as needed)
 Guiding Principle
– Build streaming applications easily while focusing on business logic
Page29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Complexities in building streaming applications
 New streaming engines and APIs
 Implementing windows, joins, and state management is hard
 Adding user’s business logic into the application
 Interaction with external services such as HBase, Hive, HDFS etc
 Deploying with all the necessary configuration files
 Operations around the streaming application including monitoring and metrics
 Debugging streaming application
Page30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key challenges that SAM is trying to solve
 Building streaming applications requires specialized skillsets that most enterprise
organizations don’t have today
 Streaming applications require considerable amount of programming, testing and tuning
before deploying to production which takes a significant amount of time
 Key streaming primitives such as joining/splitting streams, aggregations over a window of
time and pattern matching are difficult to implement
 People don’t prefer to code to build complex streaming applications
 No true open source project today solves all of the above challenges
 People don’t care about the streaming engine that powers streaming applications so much as
long challenges above are addressed and doesn’t force them into vendor lock in.
Page31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager Components and User
Personas
Distributed Streaming
Computation Engine
(Different Streaming Engines that powers higher level services to build stream application. )
App Developer
Business Analyst
Operations
Page32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM’s Value Proposition
 A platform using a graphical programming paradigm allowing users to focus on business
logic and easily build and deploy complex streaming applications
 Makes it easier for users to import other service configurations and use them in streaming
applications
 Provides abstractions on the streaming engine used. The abstraction provides the ability to
plugin in open source streaming engines (Storm, Spark, Flink, etc.)
 Decouple schema from the streaming application via integration with Schema Registry
 Provide operational metrics to monitor streaming application via pluggable metrics storage.
E.g. Ambari, OpenTSDB
 Streaming Insights, visualize the data that’s being processed by streaming application
Page33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM’s Key Capabilities
 Building streaming apps using the following primitives
– Connecting to Streams
– Joining Streams
– Forking Streams
– Aggregations over Windows
– Stream Analytics – Descriptive, Predictive, Prescriptive
– Rules Engine
– Transformations
– Filtering and Routing
– Notifications / Alerts
 Deploying streaming apps
– Deploying the streaming app on a a supported streaming engine
– Monitoring the streaming app with metrics
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Typical Streaming Application Workflow
K
a
f
k
a
P1 W1
H
B
a
s
e
Page35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM’s Service Pools and Environments
Stream App 1 Stream App 2
• Service Pool
• A pool of services that can be
used to create different
environments
• Environment
• Consists of a set of services
you choose from 1 or more
service pools.
• Stream App
• The environment is then
associated with a Stream
Application which then uses the
services in that environment for
various configuration
Page36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM’s Components
 Builder Components
Source • Kafka Source
• Event Hub
• HDFS
All Integrated with Schema Registry
Processor • Join
• Window/Aggregate
• Rule
• Normalization/Projection
• Branch
• PMML
• Custom
Sinks • Notification/Alerts (Email Support)
• HDFS
• HBase
• Hive
• JDBC
• Druid
• Cassandra
• Kafka
• OpenTSDB
• Solr
Page39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics powered by Druid and Superset
 What is Stream Insight?
– Provides a tool to business analysts to do descriptive analytics of the streaming data and
perishable insights using a sophisticated UI provided by Superset
– Tooling to create time-series and real-time analytics dashboards, charts and graphs and
create rich customizable visualization of data
Page43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo
Page45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Custom Processor
– Allows users to write their own business logic
/**
* Interface for processors to implement for processing messages at runtime
*/
public interface ProcessorRuntime {
/**
* Process the {@link StreamlineEvent} and throw a {@link ProcessingException} if an
error arises during processing
* @param event to be processed
* @return
* @throws ProcessingException
*/
List<Result> process (StreamlineEvent event) throws ProcessingException;
/**
* Initialize any necessary resources needed for the implementation
* @param config
*/
void initialize(Map<String, Object> config);
/**
* Clean up any necessary resources needed for the implementation
*/
void cleanup();
}
Page46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Window UDF
– Custom UDF’s to process window data
/**
* This is an interface for implementing user defined
functions for a single argument.
*
* @param <O> type of the result
* @param <I> type of the input argument
*/
public interface UDF<O, I> {
O evaluate(I i);
}
Built in functions
 STDDEV
 STDDEVP
 VARIANCE
 VARIANCEP
 MEAN
 MIN
 MAX
 SUM
 COUNT
 UPPER
 LOWER
 INITCAP
 SUBSTRING
 CHAR_LENGTH
 CONCAT
Page47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Notification Sink
– Interface to send Notifications such as Email, SMS and More complex to invoke external
APIs
public interface Notifier {
void open(NotificationContext ctx);
void notify(Notification notification);
void close();
boolean isPull();
List<String> getFields();
NotificationContext getContext();
}
public interface Notification {
enum Status {
NEW, DELIVERED, FAILED
}
String getId();
List<String> getEventIds();
List<String> getDataSourceIds();
String getRuleId();
Status getStatus();
Map<String, Object> getFieldsAndValues();
String getNotifierName();
long getTs();
}
Page48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What’s Next?
 Manual service pool registration not requiring Ambari
 Test sources and sinks to easily test functionality of streaming app
 Authentication and Authorization
 Other components(sources(Kinesis), processors and sinks)
Page49 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Try it out!
 Its open source under Apache License
 https://github.com/hortonworks/streamline
 Apache incubation soon
 SAM 0.4 is out!
 https://groups.google.com/forum/#!forum/streamline-users
 Contributions are welcome!
Page50 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Follow-up questions
 JP Player, Principle Solutions Engineer
jplayer@hortonworks.com
650.773.3313
 Sam Hjelmfelt, Resident Architect
shjelmfelt@hortonworks.com
605.393.7244
 Kristine Hannigan, Enterprise Account Manager
khannigan@hortonworks.com
415.323.8819

More Related Content

What's hot (20)

PPTX
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks
 
PPTX
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
PPTX
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
 
PDF
Apache NiFi: latest developments for flow management at scale
Abdelkrim Hadjidj
 
PPTX
Integrating NiFi and Flink
Bryan Bende
 
PDF
Apache NiFi SDLC Improvements
Bryan Bende
 
PPTX
Manage democratization of the data - Data Replication in Hadoop
DataWorks Summit
 
PDF
HDF: Hortonworks DataFlow: Technical Workshop
Hortonworks
 
PDF
What is New in Apache Hive 3.0?
DataWorks Summit
 
PPTX
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
PPTX
The Avant-garde of Apache NiFi
DataWorks Summit/Hadoop Summit
 
PPTX
Webinar Series Part 5 New Features of HDF 5
Hortonworks
 
PDF
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
PPTX
Sharing metadata across the data lake and streams
DataWorks Summit
 
PPTX
Mission to NARs with Apache NiFi
Hortonworks
 
PPT
Enabling a hardware accelerated deep learning data science experience for Apa...
DataWorks Summit
 
PPTX
Standalone metastore-dws-sjc-june-2018
alanfgates
 
PDF
Scalable OCR with NiFi and Tesseract
DataWorks Summit/Hadoop Summit
 
PPTX
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
 
PDF
Introduction to HDF 3.0
Timothy Spann
 
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks
 
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
 
Apache NiFi: latest developments for flow management at scale
Abdelkrim Hadjidj
 
Integrating NiFi and Flink
Bryan Bende
 
Apache NiFi SDLC Improvements
Bryan Bende
 
Manage democratization of the data - Data Replication in Hadoop
DataWorks Summit
 
HDF: Hortonworks DataFlow: Technical Workshop
Hortonworks
 
What is New in Apache Hive 3.0?
DataWorks Summit
 
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
The Avant-garde of Apache NiFi
DataWorks Summit/Hadoop Summit
 
Webinar Series Part 5 New Features of HDF 5
Hortonworks
 
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
Sharing metadata across the data lake and streams
DataWorks Summit
 
Mission to NARs with Apache NiFi
Hortonworks
 
Enabling a hardware accelerated deep learning data science experience for Apa...
DataWorks Summit
 
Standalone metastore-dws-sjc-june-2018
alanfgates
 
Scalable OCR with NiFi and Tesseract
DataWorks Summit/Hadoop Summit
 
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
 
Introduction to HDF 3.0
Timothy Spann
 

Similar to Schema Registry & Stream Analytics Manager (20)

PPTX
Streamline - Stream Analytics for Everyone
DataWorks Summit/Hadoop Summit
 
PPTX
SAM - Streaming Analytics Made Easy
DataWorks Summit
 
PPTX
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
DataWorks Summit
 
PPTX
Unlocking insights in streaming data
Carolyn Duby
 
PPTX
SAM—streaming analytics made easy
DataWorks Summit
 
PPTX
Sharing metadata across the data lake and streams
DataWorks Summit
 
PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
PPTX
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
PPTX
Make Streaming Analytics work for you: The Devil is in the Details
DataWorks Summit/Hadoop Summit
 
PDF
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
PDF
Curing the Kafka blindness—Streams Messaging Manager
DataWorks Summit
 
PPTX
Future of Data New Jersey - HDF 3.0 Deep Dive
Aldrin Piri
 
PDF
Stream analytics
rebeccatho
 
PDF
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
 
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
PDF
HDF 3.1 : An Introduction to New Features
Timothy Spann
 
PDF
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Yaroslav Tkachenko
 
PDF
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
HostedbyConfluent
 
PDF
Introduction to Streaming Analytics Manager
Yifeng Jiang
 
PDF
Introduction to Streaming Analytics
Guido Schmutz
 
Streamline - Stream Analytics for Everyone
DataWorks Summit/Hadoop Summit
 
SAM - Streaming Analytics Made Easy
DataWorks Summit
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
DataWorks Summit
 
Unlocking insights in streaming data
Carolyn Duby
 
SAM—streaming analytics made easy
DataWorks Summit
 
Sharing metadata across the data lake and streams
DataWorks Summit
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Make Streaming Analytics work for you: The Devil is in the Details
DataWorks Summit/Hadoop Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
DataWorks Summit
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Aldrin Piri
 
Stream analytics
rebeccatho
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
HDF 3.1 : An Introduction to New Features
Timothy Spann
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Yaroslav Tkachenko
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
HostedbyConfluent
 
Introduction to Streaming Analytics Manager
Yifeng Jiang
 
Introduction to Streaming Analytics
Guido Schmutz
 
Ad

Recently uploaded (20)

PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PPTX
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
PPTX
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
PPTX
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
PPTX
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PDF
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PPTX
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
PPTX
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PPT
PPT2_Metal formingMECHANICALENGINEEIRNG .ppt
Praveen Kumar
 
PDF
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
PPTX
Thermal runway and thermal stability.pptx
godow93766
 
PDF
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PPT2_Metal formingMECHANICALENGINEEIRNG .ppt
Praveen Kumar
 
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
Thermal runway and thermal stability.pptx
godow93766
 
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
Ad

Schema Registry & Stream Analytics Manager

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager (SAM) & Registry
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Registry Streaming Analytics Manager (SAM) Demo Questions
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved History of Streaming at Hortonworks  Introduced Storm as Stream Processing Engine in HDP-2.1 (Late 2013)  First to ship Apache Kafka as Enterprise Messaging Queue ( Early 2014)  Added several improvements & features into Apache Storm.  Added Security and critical features/improvements to Apache Kafka  Lot of learnings from shipping Storm & Kafka for past 3 years  Vision & Implementation of Registry & Streaming Analytics Manager based on our learnings from shipping Storm & Kafka for past 3 years.
  • 4. Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Registry
  • 5. Page5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Registry  Foundational service to enable multiple use-cases including Streaming, Machine Learning, Service discovery, Application templates  Offers base frameworks to develop Schema Registry, ML Registry etc..  Registry modules like Schema Registry, ML Registry build their own entities on top of versioned entity  Modular approach to running registry services.  Users will have flexibility to choose what registry services they would like to enable.  We have Schema Registry and ML Registry
  • 6. Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Schema Registry? What Value Does it Provide?  What is Schema Registry? • A shared repository of schemas that allows applications to flexibly interact with each other  What Value does Schema Registry Provide? – Central Metadata Repository • Provide reusable schema • Define relationship between schemas • Enable generic format conversion, and generic routing – Operational Efficiency • To avoid attaching schema to every piece of data • Producers and consumers can evolve at different rates  Example Use – Register Schemas for Kafka Topics to be used by consumers of Kafka Topic (e.g: Nifi, StreamLine)
  • 7. Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry Concepts • Schema Group A logical grouping/container for similar type of schemas or based any criteria that the customer has from managing the schemas • Schema Metadata Metadata associated with a named schema. • Schema Version The actual versioned schema associated a schema meta definition Schema Metadata 1 Schema Name Schema Type Description Compatibility Policy Serializers Deserializers Schema Group Group Name SchemaVersion 3 SchemaVersion 2 Schema Version 1 version text Fingerprint
  • 8. Page8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sender/Receiver flow Local schema/serdes cache Serializer Producer Schema Registry Client Message Store Local schema/serdes cache Deserializer Schema Registry Client version payloa d version payloa d Schema Storage SerDes Storage Consumer SchemaRegist ry SchemaRegist ry SchemaRegist ry
  • 9. Page9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry Schema Registry Component Architecture SR Web Server Schema Registry Web App REST APISchema Registry Client Java Client Integrations Nifi Processors Kafka Ser/Des StreamLine Schema Storage Pluggable Storage Serializer/Deserializer Jar Storage MySQL In-Memory Local File System HDFSPostgre s
  • 10. Page11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Compatibility Policies  What is a Compatibility Policy? – Defines the rules of how the schemas can evolve – Subsequent version updates has to honor the schema’s original compatibility.  Policies Supported – Backward – Forward – Both – None
  • 11. Page12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema evolution Producer v2 Consumer v2 Producer v1 Producer v4 Consumer v5 Producer v1 Consumer v7
  • 12. Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Serializers/Deserializers  Snapshot based serializer/deserializer – Seriliazes the complete payload – Deserializes the payload to respective type  Pull based serializer/deserializer – Serialize whatever elements are required and ignore other elements – Pull out whatever elements that are required to build the desired object  Push based deserializer – Gives callback to receive parsing events for respective fields in schema
  • 13. Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema registry client  REST based client  Caching – Metadata – Schema versions – Ser/des libs and class loaders  URL selectors – Round robin – Failover
  • 14. Page19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HA  Storage provider – Depends on transactional support of underlying SQL stores – Spinup required schema registry instances  Supports HA at SchemaRegistry – Using ZK/Curator – Automatic failover of master – Master gets all writes – Slaves receives only reads SchemaRegistr y storage SchemaRegistr y SchemaRegistr y SchemaRegistr y SchemaRegistr ySchemaRegistr y storage
  • 15. Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Integration of Schema Registry  Kafka – Using producer/consumer API for serializer/deserializer  Nifi Processors for Schema Registry – Fetch Schema – Serialize/Deserialize with Schema  StreamLine processors for Schema Registry – Lookup Schema of a Kafka, Kinesis, EventHubs Topic – Lookup Schema of a HDFS Directory
  • 16. Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry UI
  • 17. Page25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved WIP/Future enhancements  Security – Kerberos support – Default authorizers and Apache Ranger support  Audit of Schemas & Clients  Rich Types in Schema definition  Pluggable Listeners  Schema Policies  Notifications – New versions – Archiving  Converters
  • 18. Page26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Try it out!  Its open source under Apache License  https://github.com/hortonworks/registry  Apache incubation soon  Registry 0.2 release April 25th, 0.3 release on May 31st  https://groups.google.com/forum/#!forum/registry  We are seeing outside contributions  Contributions are welcome!
  • 19. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager
  • 20. Page28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager  What is it? • A platform used to design, develop, deploy and manage streaming analytics applications using a drag drop visualize paradigm in minutes • Supports event correlation, context enrichment , complex pattern matching, analytical aggregations and alerts/notifications when insights are discovered. • It is agnostic to the underlying streaming engine and can support multiple streaming substrates (e.g: Storm, Spark Streaming, Flink) • Extensibility is a first class citizen (add sinks, processors, sources as needed)  Guiding Principle – Build streaming applications easily while focusing on business logic
  • 21. Page29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Complexities in building streaming applications  New streaming engines and APIs  Implementing windows, joins, and state management is hard  Adding user’s business logic into the application  Interaction with external services such as HBase, Hive, HDFS etc  Deploying with all the necessary configuration files  Operations around the streaming application including monitoring and metrics  Debugging streaming application
  • 22. Page30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key challenges that SAM is trying to solve  Building streaming applications requires specialized skillsets that most enterprise organizations don’t have today  Streaming applications require considerable amount of programming, testing and tuning before deploying to production which takes a significant amount of time  Key streaming primitives such as joining/splitting streams, aggregations over a window of time and pattern matching are difficult to implement  People don’t prefer to code to build complex streaming applications  No true open source project today solves all of the above challenges  People don’t care about the streaming engine that powers streaming applications so much as long challenges above are addressed and doesn’t force them into vendor lock in.
  • 23. Page31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager Components and User Personas Distributed Streaming Computation Engine (Different Streaming Engines that powers higher level services to build stream application. ) App Developer Business Analyst Operations
  • 24. Page32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Value Proposition  A platform using a graphical programming paradigm allowing users to focus on business logic and easily build and deploy complex streaming applications  Makes it easier for users to import other service configurations and use them in streaming applications  Provides abstractions on the streaming engine used. The abstraction provides the ability to plugin in open source streaming engines (Storm, Spark, Flink, etc.)  Decouple schema from the streaming application via integration with Schema Registry  Provide operational metrics to monitor streaming application via pluggable metrics storage. E.g. Ambari, OpenTSDB  Streaming Insights, visualize the data that’s being processed by streaming application
  • 25. Page33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Key Capabilities  Building streaming apps using the following primitives – Connecting to Streams – Joining Streams – Forking Streams – Aggregations over Windows – Stream Analytics – Descriptive, Predictive, Prescriptive – Rules Engine – Transformations – Filtering and Routing – Notifications / Alerts  Deploying streaming apps – Deploying the streaming app on a a supported streaming engine – Monitoring the streaming app with metrics
  • 26. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Typical Streaming Application Workflow K a f k a P1 W1 H B a s e
  • 27. Page35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Service Pools and Environments Stream App 1 Stream App 2 • Service Pool • A pool of services that can be used to create different environments • Environment • Consists of a set of services you choose from 1 or more service pools. • Stream App • The environment is then associated with a Stream Application which then uses the services in that environment for various configuration
  • 28. Page36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 29. Page37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 30. Page38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Components  Builder Components Source • Kafka Source • Event Hub • HDFS All Integrated with Schema Registry Processor • Join • Window/Aggregate • Rule • Normalization/Projection • Branch • PMML • Custom Sinks • Notification/Alerts (Email Support) • HDFS • HBase • Hive • JDBC • Druid • Cassandra • Kafka • OpenTSDB • Solr
  • 31. Page39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 32. Page40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 33. Page41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 34. Page42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics powered by Druid and Superset  What is Stream Insight? – Provides a tool to business analysts to do descriptive analytics of the streaming data and perishable insights using a sophisticated UI provided by Superset – Tooling to create time-series and real-time analytics dashboards, charts and graphs and create rich customizable visualization of data
  • 35. Page43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 36. 44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo
  • 37. Page45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Custom Processor – Allows users to write their own business logic /** * Interface for processors to implement for processing messages at runtime */ public interface ProcessorRuntime { /** * Process the {@link StreamlineEvent} and throw a {@link ProcessingException} if an error arises during processing * @param event to be processed * @return * @throws ProcessingException */ List<Result> process (StreamlineEvent event) throws ProcessingException; /** * Initialize any necessary resources needed for the implementation * @param config */ void initialize(Map<String, Object> config); /** * Clean up any necessary resources needed for the implementation */ void cleanup(); }
  • 38. Page46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Window UDF – Custom UDF’s to process window data /** * This is an interface for implementing user defined functions for a single argument. * * @param <O> type of the result * @param <I> type of the input argument */ public interface UDF<O, I> { O evaluate(I i); } Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  • 39. Page47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Notification Sink – Interface to send Notifications such as Email, SMS and More complex to invoke external APIs public interface Notifier { void open(NotificationContext ctx); void notify(Notification notification); void close(); boolean isPull(); List<String> getFields(); NotificationContext getContext(); } public interface Notification { enum Status { NEW, DELIVERED, FAILED } String getId(); List<String> getEventIds(); List<String> getDataSourceIds(); String getRuleId(); Status getStatus(); Map<String, Object> getFieldsAndValues(); String getNotifierName(); long getTs(); }
  • 40. Page48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What’s Next?  Manual service pool registration not requiring Ambari  Test sources and sinks to easily test functionality of streaming app  Authentication and Authorization  Other components(sources(Kinesis), processors and sinks)
  • 41. Page49 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Try it out!  Its open source under Apache License  https://github.com/hortonworks/streamline  Apache incubation soon  SAM 0.4 is out!  https://groups.google.com/forum/#!forum/streamline-users  Contributions are welcome!
  • 42. Page50 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Follow-up questions  JP Player, Principle Solutions Engineer [email protected] 650.773.3313  Sam Hjelmfelt, Resident Architect [email protected] 605.393.7244  Kristine Hannigan, Enterprise Account Manager [email protected] 415.323.8819