SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry
Satish Duggana, Hortonworks
Dataworks summit - 2017, Munich
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Introduction
 What is Schema Registry?
• A shared repository of schemas that allows applications to flexibly interact with each other
 What Value does Schema Registry Provide?
– Data Governance
• Provide reusable schema
• Define relationship between schemas
• Enable generic format conversion, and generic routing
– Operational Efficiency
• To avoid attaching schema to every piece of data
• Producers and consumers can evolve at different rates
 Example Use
– Register Schemas for Kafka Topics to be used by consumers of Kafka Topic (e.g: Nifi, StreamLine)
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry Concepts
• Schema Group
A logical grouping/container for
similar type of schemas or
based any criteria that the
customer has from managing
the schemas
• Schema Metadata
Metadata associated with a
named schema.
• Schema Version
The actual versioned schema
associated a schema meta
definition
Schema Metadata 1
Schema Name
Schema Type
Description
Compatibility Policy
Serializers
Deserializers
Schema Group
Group Name
SchemaVersion 3
SchemaVersion 2
Schema Version 1
version
text
Fingerprint
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry
Schema Registry Component Architecture
SR Web Server
Schema Registry
Web App
REST APISchema Registry Client
Java Client
Integrations
Nifi Processors Kafka Ser/Des StreamLine
Schema
Storage
Pluggable Storage
Serializer/Deserializer
Jar Storage
MySQL In-Memory Local File
System
HDFSPostgres
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Writer/Reader schemas
 Writer schema
– Senders/Producers use this schema while sending the payloads according to the given schema viz
writer’s schema
 Reader/Projection schema
– Receivers uses this schema to project the received payload written with a writer schema.
Sender Receiver
Writer
Schema
Writer
Schema
Projection
Schema
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema evolution
Producer
v2
Consumer
v2
Producer
v1
Producer
v4
Consumer
v5
Producer
v1
Consumer
v7
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Compatibility Policies
 What is a Compatibility Policy?
– Defines the rules of how the schemas can evolve
– Subsequent version updates has to honor the schema’s original compatibility.
 Policies Supported
– Backward
– Forward
– Both
– None
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Backward compatibility
 New version of a schema would be compatible with earlier version of that schema.
 Data written from earlier version of the schema, can be read with a new version of the
schema.
V1
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
}
]
}
V2
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
},
{
"name": "pages",
"type": "int",
"default": -1
}
]
}
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Forward compatibility
 Existing schema is compatible with future versions of the schema.
 That means the data written from new version of the schema can still be read with old
version of the schema.
V1
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
}
]
}
V2
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
},
{
"name": "pages",
"type": "int"
}
]
}
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Both/Full compatibility
 New version of the schema provides both backward and forward compatibilities.
V1
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
}
]
}
V2
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
},
{
"name": "pages",
"type": "int",
"default": -1
},
{
"name": "title",
"type" : "string",
"default": ""
}
]
}
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema composition
 Schemas can be shared and reused with in existing schemas
 Inbuilt support in default serializer/deserializer to build effective schemas
{
"name": "account",
"namespace": "com.hortonworks.example.types",
"includeSchemas": [
{
"name": "utils”
}
],
"type": "record",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "id",
"type": "com.hortonworks.datatypes.uuid"
}
]
}
{
"name": "uuid",
"type": "record",
"namespace": "com.hortonworks.datatypes",
"doc": "A Universally Unique Identifier, in canonical form in
lowercase. This is generated from java.util.UUID Example:
de305d54-75b4-431b-adb2-eb6b9e546014",
"fields": [
{
"name": "value",
"type": "string",
"default": ""
}
]
}
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sender/Receiver flow
Local
schema/serdes
cache
Serializer
Sender
Schema Registry
Client
Message Store
Local
schema/serdes
cache
Deserializer
Schema Registry
Client
version
payload
version
payload
Schema Storage SerDes Storage
Receiver
SchemaRegistrySchemaRegistry SchemaRegistry
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Serializers/Deserializers
 Snapshot based serializer/deserializer
– Serializes the complete payload
– Deserializes the payload to respective type
 Pull based serializer/deserializer
– Serialize whatever elements are required and ignore other elements
– Pull out whatever elements that are required to build the desired object
 Push based deserializer
– Gives callback to receive parsing events for respective fields in schema
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema registry client
 REST based client
 Caching
– Metadata
– Schema versions
– Ser/des libs and class loaders
 URL selectors
– Round robin
– Failover
– Custom
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HA
 Storage provider
– Depends on transactional support of
underlying SQL stores
– Spinup required schema registry
instances
 Supports HA at SchemaRegistry
– Using ZK/Curator
– Automatic failover of master
– Master gets all writes
– Slaves receive only reads
SchemaRegistry
storage
SchemaRegistrySchemaRegistry
SchemaRegistry
SchemaRegistry
SchemaRegistry
storage
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Integration of Schema Registry
 Kafka
– Using producer/consumer API for serializer/deserializer
 Nifi Processors for Schema Registry
– Fetch Schema
– Serialize/Deserialize with Schema
 StreamLine
– Lookup Schema of a Kafka Topic
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kafka integration
Local
schema/serdes
cache
KafkaAvro
Serializer
Producer
Schema Registry
Client
Local
schema/serdes
cache
KafkaAvro
Deserializer
Schema Registry
Client
version
payload
version
payload
Consumer
SchemaRegistrySchemaRegistry SchemaRegistry
Kafka
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kafka Avro ser/des protocol
 ser/des can be implemented with different protocols
 Default ser/des send protocol/schema versions as part of the binary payload of kafka
messages
– Can be enhanced to use headers/metadata instead of the message payload
– Custom ser/des can be registered for schemas.
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Nifi integration
 Nifi Controller Service
 Nifi processors
– Transforms
• Avro – CSV
• Avro – Json
• Json – CSV
– Extracting Avro fields
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry UI
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
WIP/Future enhancements
 Security
– Kerberos support
– Default authorizers and Apache Ranger support
 Archiving schemas
 Notifications
– New versions
– Archiving
 Converters
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Try it out!
 https://github.com/hortonworks/registry
 https://groups.google.com/forum/#!forum/registry
 Open sourced under Apache license
 Apache incubation soon
 Contributions are welcome
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Q & A
Ad

More Related Content

What's hot (20)

State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
DataWorks Summit/Hadoop Summit
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
DataWorks Summit/Hadoop Summit
 
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
DataWorks Summit
 
introduction-to-apache-kafka
introduction-to-apache-kafkaintroduction-to-apache-kafka
introduction-to-apache-kafka
Yifeng Jiang
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
DataWorks Summit/Hadoop Summit
 
MLeap: Deploy Spark ML Pipelines to Production API Servers
MLeap: Deploy Spark ML Pipelines to Production API ServersMLeap: Deploy Spark ML Pipelines to Production API Servers
MLeap: Deploy Spark ML Pipelines to Production API Servers
DataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
DataWorks Summit
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
Hortonworks
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
DataWorks Summit
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Mingliang Liu
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
 
Cost-based Query Optimization
Cost-based Query Optimization Cost-based Query Optimization
Cost-based Query Optimization
DataWorks Summit/Hadoop Summit
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Schema registry
Schema registrySchema registry
Schema registry
Whiteklay
 
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
DataWorks Summit
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
DataWorks Summit/Hadoop Summit
 
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
DataWorks Summit
 
introduction-to-apache-kafka
introduction-to-apache-kafkaintroduction-to-apache-kafka
introduction-to-apache-kafka
Yifeng Jiang
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
DataWorks Summit/Hadoop Summit
 
MLeap: Deploy Spark ML Pipelines to Production API Servers
MLeap: Deploy Spark ML Pipelines to Production API ServersMLeap: Deploy Spark ML Pipelines to Production API Servers
MLeap: Deploy Spark ML Pipelines to Production API Servers
DataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
DataWorks Summit
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
Hortonworks
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
DataWorks Summit
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Mingliang Liu
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
 
Schema registry
Schema registrySchema registry
Schema registry
Whiteklay
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
DataWorks Summit
 

Similar to Schema Registry - Set you Data Free (20)

Tutorial Expert How-To - Create a model for Avro schemas
Tutorial Expert How-To - Create a model for Avro schemasTutorial Expert How-To - Create a model for Avro schemas
Tutorial Expert How-To - Create a model for Avro schemas
PascalDesmarets1
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics Manager
Sriharsha Chintalapani
 
Ballerina- A programming language for the networked world
Ballerina- A programming language for the networked worldBallerina- A programming language for the networked world
Ballerina- A programming language for the networked world
Integration Meetups
 
Ballerina- A programming language for the networked world
Ballerina- A programming language for the networked worldBallerina- A programming language for the networked world
Ballerina- A programming language for the networked world
Asangi Jasenthuliyana
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
DataWorks Summit
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
alanfgates
 
Deep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBDeep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDB
ArangoDB Database
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google Pregel
Max Neunhöffer
 
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
HostedbyConfluent
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
HostedbyConfluent
 
MySQL Connector/Node.js and the X DevAPI
MySQL Connector/Node.js and the X DevAPIMySQL Connector/Node.js and the X DevAPI
MySQL Connector/Node.js and the X DevAPI
Rui Quelhas
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Shivji Kumar Jha
 
Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
Max Neunhöffer
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Cloudera, Inc.
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
Technology Stack Discussion
Technology Stack DiscussionTechnology Stack Discussion
Technology Stack Discussion
Zaiyang Li
 
Avro
AvroAvro
Avro
Eric Turcotte
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
Jenn Rawlins
 
Tutorial Expert How-To - Create a model for Avro schemas
Tutorial Expert How-To - Create a model for Avro schemasTutorial Expert How-To - Create a model for Avro schemas
Tutorial Expert How-To - Create a model for Avro schemas
PascalDesmarets1
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics Manager
Sriharsha Chintalapani
 
Ballerina- A programming language for the networked world
Ballerina- A programming language for the networked worldBallerina- A programming language for the networked world
Ballerina- A programming language for the networked world
Integration Meetups
 
Ballerina- A programming language for the networked world
Ballerina- A programming language for the networked worldBallerina- A programming language for the networked world
Ballerina- A programming language for the networked world
Asangi Jasenthuliyana
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
DataWorks Summit
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
alanfgates
 
Deep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBDeep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDB
ArangoDB Database
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google Pregel
Max Neunhöffer
 
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
HostedbyConfluent
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
HostedbyConfluent
 
MySQL Connector/Node.js and the X DevAPI
MySQL Connector/Node.js and the X DevAPIMySQL Connector/Node.js and the X DevAPI
MySQL Connector/Node.js and the X DevAPI
Rui Quelhas
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Shivji Kumar Jha
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Cloudera, Inc.
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
Technology Stack Discussion
Technology Stack DiscussionTechnology Stack Discussion
Technology Stack Discussion
Zaiyang Li
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
Jenn Rawlins
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
DataWorks Summit/Hadoop Summit
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
DataWorks Summit/Hadoop Summit
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
DataWorks Summit/Hadoop Summit
 
Ad

Recently uploaded (20)

Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 

Schema Registry - Set you Data Free

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry Satish Duggana, Hortonworks Dataworks summit - 2017, Munich
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Introduction  What is Schema Registry? • A shared repository of schemas that allows applications to flexibly interact with each other  What Value does Schema Registry Provide? – Data Governance • Provide reusable schema • Define relationship between schemas • Enable generic format conversion, and generic routing – Operational Efficiency • To avoid attaching schema to every piece of data • Producers and consumers can evolve at different rates  Example Use – Register Schemas for Kafka Topics to be used by consumers of Kafka Topic (e.g: Nifi, StreamLine)
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry Concepts • Schema Group A logical grouping/container for similar type of schemas or based any criteria that the customer has from managing the schemas • Schema Metadata Metadata associated with a named schema. • Schema Version The actual versioned schema associated a schema meta definition Schema Metadata 1 Schema Name Schema Type Description Compatibility Policy Serializers Deserializers Schema Group Group Name SchemaVersion 3 SchemaVersion 2 Schema Version 1 version text Fingerprint
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry Schema Registry Component Architecture SR Web Server Schema Registry Web App REST APISchema Registry Client Java Client Integrations Nifi Processors Kafka Ser/Des StreamLine Schema Storage Pluggable Storage Serializer/Deserializer Jar Storage MySQL In-Memory Local File System HDFSPostgres
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Writer/Reader schemas  Writer schema – Senders/Producers use this schema while sending the payloads according to the given schema viz writer’s schema  Reader/Projection schema – Receivers uses this schema to project the received payload written with a writer schema. Sender Receiver Writer Schema Writer Schema Projection Schema
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema evolution Producer v2 Consumer v2 Producer v1 Producer v4 Consumer v5 Producer v1 Consumer v7
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Compatibility Policies  What is a Compatibility Policy? – Defines the rules of how the schemas can evolve – Subsequent version updates has to honor the schema’s original compatibility.  Policies Supported – Backward – Forward – Both – None
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Backward compatibility  New version of a schema would be compatible with earlier version of that schema.  Data written from earlier version of the schema, can be read with a new version of the schema. V1 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ] } V2 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int", "default": -1 } ] }
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Forward compatibility  Existing schema is compatible with future versions of the schema.  That means the data written from new version of the schema can still be read with old version of the schema. V1 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ] } V2 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int" } ] }
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Both/Full compatibility  New version of the schema provides both backward and forward compatibilities. V1 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ] } V2 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int", "default": -1 }, { "name": "title", "type" : "string", "default": "" } ] }
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema composition  Schemas can be shared and reused with in existing schemas  Inbuilt support in default serializer/deserializer to build effective schemas { "name": "account", "namespace": "com.hortonworks.example.types", "includeSchemas": [ { "name": "utils” } ], "type": "record", "fields": [ { "name": "name", "type": "string" }, { "name": "id", "type": "com.hortonworks.datatypes.uuid" } ] } { "name": "uuid", "type": "record", "namespace": "com.hortonworks.datatypes", "doc": "A Universally Unique Identifier, in canonical form in lowercase. This is generated from java.util.UUID Example: de305d54-75b4-431b-adb2-eb6b9e546014", "fields": [ { "name": "value", "type": "string", "default": "" } ] }
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sender/Receiver flow Local schema/serdes cache Serializer Sender Schema Registry Client Message Store Local schema/serdes cache Deserializer Schema Registry Client version payload version payload Schema Storage SerDes Storage Receiver SchemaRegistrySchemaRegistry SchemaRegistry
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Serializers/Deserializers  Snapshot based serializer/deserializer – Serializes the complete payload – Deserializes the payload to respective type  Pull based serializer/deserializer – Serialize whatever elements are required and ignore other elements – Pull out whatever elements that are required to build the desired object  Push based deserializer – Gives callback to receive parsing events for respective fields in schema
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema registry client  REST based client  Caching – Metadata – Schema versions – Ser/des libs and class loaders  URL selectors – Round robin – Failover – Custom
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HA  Storage provider – Depends on transactional support of underlying SQL stores – Spinup required schema registry instances  Supports HA at SchemaRegistry – Using ZK/Curator – Automatic failover of master – Master gets all writes – Slaves receive only reads SchemaRegistry storage SchemaRegistrySchemaRegistry SchemaRegistry SchemaRegistry SchemaRegistry storage
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Integration of Schema Registry  Kafka – Using producer/consumer API for serializer/deserializer  Nifi Processors for Schema Registry – Fetch Schema – Serialize/Deserialize with Schema  StreamLine – Lookup Schema of a Kafka Topic
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kafka integration Local schema/serdes cache KafkaAvro Serializer Producer Schema Registry Client Local schema/serdes cache KafkaAvro Deserializer Schema Registry Client version payload version payload Consumer SchemaRegistrySchemaRegistry SchemaRegistry Kafka
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kafka Avro ser/des protocol  ser/des can be implemented with different protocols  Default ser/des send protocol/schema versions as part of the binary payload of kafka messages – Can be enhanced to use headers/metadata instead of the message payload – Custom ser/des can be registered for schemas.
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Nifi integration  Nifi Controller Service  Nifi processors – Transforms • Avro – CSV • Avro – Json • Json – CSV – Extracting Avro fields
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry UI
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved WIP/Future enhancements  Security – Kerberos support – Default authorizers and Apache Ranger support  Archiving schemas  Notifications – New versions – Archiving  Converters
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Try it out!  https://github.com/hortonworks/registry  https://groups.google.com/forum/#!forum/registry  Open sourced under Apache license  Apache incubation soon  Contributions are welcome
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Q & A

Editor's Notes

  • #20: Exposes operations to serialize and deserialize the contents of the FlowFile as well as the operations to query the actual schema. NOTE: at the moment only AVRO schema type is supported. UpdateAttributeViaSchemaRegistry Transform processors