SlideShare a Scribd company logo
Hackolade Tutorial
Visual design of Avro schema
Copyright © 2016-2023 Hackolade 1
About Apache Avro
• Open-source project
• Provides
• Data serialisation framework
• Data exchange services
• Often used with
• Kafka pub-sub pipelines
• Data lake storage
• Row-oriented object container
• as opposed to Parquet which is column-oriented
• Language independent, compact and efficient
Copyright © 2016-2023 Hackolade 2
Components of an Avro Schema Model
• An Avro schema can be viewed as a language-
agnostic contract for systems to interoperate.
• 4 attributes:
• Type: specifies the data type of the JSON record,
whether its complex type or primitive value. At the top
level of an Avro schema, it is mandatory to have a
“record” type.
• Name: the name of the Avro schema being defined
• Namespace: a high-level logical indicator of the Avro
schema
• Fields: the individual data elements of the
JSON object. Fields can be of primitive as
well as complex type, which can be
further made of simple and complex
data types.
Copyright © 2016-2023 Hackolade 3
About Apache Avro
• Uses JSON to define schema and data types
• Warning: Avro schema is quite different from JSON Schema, even if schema is
defined using JSON format
• Container file has header containing the schema, plus 1 or more
storage blocks
Copyright © 2016-2023 Hackolade 4
Avro schema evolution
• Avro allows for powerful schema evolution
• Can achieve backward- and forward-compatibility if done well
• Super interesting for “integration” services where many producers/consumers
may evolve applications using different versions of the schema
• Schemas and their evolving versions can be published in a schema
registry
• Goal is to maximise compatibility when decoupling the lifecycle of
publishers and consumers
• Specific guidelines and best practices
Copyright © 2016-2023 Hackolade 5
Benefits of Avro schema design in Hackolade Studio
• Visual design of Avro schema
• Anticipate future schema evolution
• Facilitate message validation
• Optimize message payload
• Enable compatibility and interoperability
• Integrate with Confluent Schema Registry and others
• Promote schema reuse (despite limited official specification and
documentation) via
• Avro namespace references
• Confluent Schema Registry’s
• Schema references
• Union schemas
Copyright © 2016-2023 Hackolade 6
Hackolade Studio support for Avro schema
• Graphical Avro schema design tool
• Can also import existing Avro schemas
• Integrates with the schema registries
(Confluent Schema Registry,
Azure EventHubs Schema Registry,
Pulsar Schema Registry,
other CSR API-compatible registries)
• For forward- and reverse-engineering of the Avro schema with these registries
• Hackolade also integrates with Object Storage providers (S3, ADLS, …)
for data lakes
• Hackolade maintains (recursive) references between elements of the
Avro schema
Copyright © 2016-2023 Hackolade 7
Hackolade Studio support for Avro schema
• Outputs of Avro Schema modeling
in Hackolade Studio:
• Entity-Relationship Diagrams of
multi-record/event environments
• Hierarchical view of nested objects
• Syntactically correct schema
generation for
• users without technical knowledge
• developers wanting to improve
quality and productivity
• Documentation of schema in
different formats
• Conversion to/from other targets
(OpenAPI, RDBMS, NoSQL) via
Polyglot Data Modeling
Copyright © 2016-2023 Hackolade 8
Hackolade Studio and Confluent Schema Registry
• Central repository with RESTful interface
• Developers publish schemas for
• Versioning
• Safe schema evolution
• Enhanced integrity
• Data discoverability
• Self-hosted or SaaS
• Hackolade Studio native integration
• can forward/reverse engineer Avro schemas with the registry
• including support for the different Subject Name Strategies
• supports namespace references, schema references, and union schemas
Copyright © 2016-2023 Hackolade 9
Reading material
• See Hackolade Studio online documentation
• The Hackolade Blog
• These excellent new books:
• MongoDB Data Modeling & Schema Design
• Many of the principles in the book are related to query-driven
modeling based on access patterns
• Neo4j Data Modeling
• Hackolade’s on social media: LinkedIn page, Twitter page
• Download Hackolade Studio for free
Copyright © 2016-2023 Hackolade 10
Questions?
Answers!
Copyright © 2016-2023 Hackolade 11
Ad

Recommended

3 avro hug-2010-07-21
3 avro hug-2010-07-21
Hadoop User Group
 
Hackolade Tutorial - part 9 - Export or forward-engineer.pdf
Hackolade Tutorial - part 9 - Export or forward-engineer.pdf
PascalDesmarets1
 
Tutorial Expert How-To - Add reusable Definitions
Tutorial Expert How-To - Add reusable Definitions
PascalDesmarets1
 
Efficient Schemas in Motion with Kafka and Schema Registry
Efficient Schemas in Motion with Kafka and Schema Registry
Pat Patterson
 
Avro intro
Avro intro
Randy Abernethy
 
Apach avro
Apach avro
megrhi haikel
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
DataWorks Summit
 
Avro Data | Washington DC HUG
Avro Data | Washington DC HUG
Cloudera, Inc.
 
Tutorial Getting Started part 2 - Polyglot Data Modeling
Tutorial Getting Started part 2 - Polyglot Data Modeling
PascalDesmarets1
 
Tutorial Expert How-To - Command Line Interface (CLI)
Tutorial Expert How-To - Command Line Interface (CLI)
PascalDesmarets1
 
Hw09 Next Steps For Hadoop
Hw09 Next Steps For Hadoop
Cloudera, Inc.
 
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
PascalDesmarets1
 
Hackolade Tutorial - part 2 - Overview of JSON and JSON schema
Hackolade Tutorial - part 2 - Overview of JSON and JSON schema
PascalDesmarets1
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
LivePerson
 
Apache avro data serialization framework
Apache avro data serialization framework
veeracynixit
 
Evolving Streaming Applications
Evolving Streaming Applications
DataWorks Summit
 
Hackolade Tutorial - part 6 - Add choice, conditional, pattern fields.pdf
Hackolade Tutorial - part 6 - Add choice, conditional, pattern fields.pdf
PascalDesmarets1
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
 
Apache avro and overview hadoop tools
Apache avro and overview hadoop tools
alireza alikhani
 
avrointroduction-150325003254-conversion-gate01.pptx
avrointroduction-150325003254-conversion-gate01.pptx
kuthubussaman1
 
Avro introduction
Avro introduction
Nanda8904648951
 
Schema Evolution for Resilient Data microservices
Schema Evolution for Resilient Data microservices
Vinícius Carvalho
 
ApacheCon09: Avro
ApacheCon09: Avro
Cloudera, Inc.
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Chicago Hadoop Users Group
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
HostedbyConfluent
 
Parquet and AVRO
Parquet and AVRO
airisData
 
Tutorial Expert How-To - Verify Data Model
Tutorial Expert How-To - Verify Data Model
PascalDesmarets1
 
Tutorial Getting Started part 1 - Overview
Tutorial Getting Started part 1 - Overview
PascalDesmarets1
 
Tutorial Workgroup - Working with Forks
Tutorial Workgroup - Working with Forks
PascalDesmarets1
 
Tutorial Advanced How-To - Oracle 23c Duality views
Tutorial Advanced How-To - Oracle 23c Duality views
PascalDesmarets1
 

More Related Content

Similar to Tutorial Expert How-To - Create a model for Avro schemas (20)

Tutorial Getting Started part 2 - Polyglot Data Modeling
Tutorial Getting Started part 2 - Polyglot Data Modeling
PascalDesmarets1
 
Tutorial Expert How-To - Command Line Interface (CLI)
Tutorial Expert How-To - Command Line Interface (CLI)
PascalDesmarets1
 
Hw09 Next Steps For Hadoop
Hw09 Next Steps For Hadoop
Cloudera, Inc.
 
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
PascalDesmarets1
 
Hackolade Tutorial - part 2 - Overview of JSON and JSON schema
Hackolade Tutorial - part 2 - Overview of JSON and JSON schema
PascalDesmarets1
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
LivePerson
 
Apache avro data serialization framework
Apache avro data serialization framework
veeracynixit
 
Evolving Streaming Applications
Evolving Streaming Applications
DataWorks Summit
 
Hackolade Tutorial - part 6 - Add choice, conditional, pattern fields.pdf
Hackolade Tutorial - part 6 - Add choice, conditional, pattern fields.pdf
PascalDesmarets1
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
 
Apache avro and overview hadoop tools
Apache avro and overview hadoop tools
alireza alikhani
 
avrointroduction-150325003254-conversion-gate01.pptx
avrointroduction-150325003254-conversion-gate01.pptx
kuthubussaman1
 
Avro introduction
Avro introduction
Nanda8904648951
 
Schema Evolution for Resilient Data microservices
Schema Evolution for Resilient Data microservices
Vinícius Carvalho
 
ApacheCon09: Avro
ApacheCon09: Avro
Cloudera, Inc.
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Chicago Hadoop Users Group
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
HostedbyConfluent
 
Parquet and AVRO
Parquet and AVRO
airisData
 
Tutorial Expert How-To - Verify Data Model
Tutorial Expert How-To - Verify Data Model
PascalDesmarets1
 
Tutorial Getting Started part 1 - Overview
Tutorial Getting Started part 1 - Overview
PascalDesmarets1
 
Tutorial Getting Started part 2 - Polyglot Data Modeling
Tutorial Getting Started part 2 - Polyglot Data Modeling
PascalDesmarets1
 
Tutorial Expert How-To - Command Line Interface (CLI)
Tutorial Expert How-To - Command Line Interface (CLI)
PascalDesmarets1
 
Hw09 Next Steps For Hadoop
Hw09 Next Steps For Hadoop
Cloudera, Inc.
 
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
PascalDesmarets1
 
Hackolade Tutorial - part 2 - Overview of JSON and JSON schema
Hackolade Tutorial - part 2 - Overview of JSON and JSON schema
PascalDesmarets1
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
LivePerson
 
Apache avro data serialization framework
Apache avro data serialization framework
veeracynixit
 
Evolving Streaming Applications
Evolving Streaming Applications
DataWorks Summit
 
Hackolade Tutorial - part 6 - Add choice, conditional, pattern fields.pdf
Hackolade Tutorial - part 6 - Add choice, conditional, pattern fields.pdf
PascalDesmarets1
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
 
Apache avro and overview hadoop tools
Apache avro and overview hadoop tools
alireza alikhani
 
avrointroduction-150325003254-conversion-gate01.pptx
avrointroduction-150325003254-conversion-gate01.pptx
kuthubussaman1
 
Schema Evolution for Resilient Data microservices
Schema Evolution for Resilient Data microservices
Vinícius Carvalho
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Chicago Hadoop Users Group
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
HostedbyConfluent
 
Parquet and AVRO
Parquet and AVRO
airisData
 
Tutorial Expert How-To - Verify Data Model
Tutorial Expert How-To - Verify Data Model
PascalDesmarets1
 
Tutorial Getting Started part 1 - Overview
Tutorial Getting Started part 1 - Overview
PascalDesmarets1
 

More from PascalDesmarets1 (15)

Tutorial Workgroup - Working with Forks
Tutorial Workgroup - Working with Forks
PascalDesmarets1
 
Tutorial Advanced How-To - Oracle 23c Duality views
Tutorial Advanced How-To - Oracle 23c Duality views
PascalDesmarets1
 
Tutorial Expert How-To - Docker-based automation
Tutorial Expert How-To - Docker-based automation
PascalDesmarets1
 
Tutorial Getting Started part 4 - Domain-Driven Data Modeling
Tutorial Getting Started part 4 - Domain-Driven Data Modeling
PascalDesmarets1
 
Tutorial Getting Started part 3 - Metadata-as-Code
Tutorial Getting Started part 3 - Metadata-as-Code
PascalDesmarets1
 
Tutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaboration
PascalDesmarets1
 
Tutorial Expert How-To - Naming Conventions
Tutorial Expert How-To - Naming Conventions
PascalDesmarets1
 
Tutorial Expert How-To - Export-Import with Excel template
Tutorial Expert How-To - Export-Import with Excel template
PascalDesmarets1
 
Tutorial Expert How-To - Compare and Merge
Tutorial Expert How-To - Compare and Merge
PascalDesmarets1
 
Tutorial Expert How-To - Custom properties
Tutorial Expert How-To - Custom properties
PascalDesmarets1
 
Hackolade Tutorial - part 13 - Leverage a Polyglot data model
Hackolade Tutorial - part 13 - Leverage a Polyglot data model
PascalDesmarets1
 
Hackolade Tutorial - part 12 - Create a REST API model
Hackolade Tutorial - part 12 - Create a REST API model
PascalDesmarets1
 
Hackolade Tutorial - part 8 - Import or reverse-engineer.pdf
Hackolade Tutorial - part 8 - Import or reverse-engineer.pdf
PascalDesmarets1
 
Hackolade Tutorial - part 4 - Create your first data model
Hackolade Tutorial - part 4 - Create your first data model
PascalDesmarets1
 
Hackolade Tutorial - part 1 - What is a data model
Hackolade Tutorial - part 1 - What is a data model
PascalDesmarets1
 
Tutorial Workgroup - Working with Forks
Tutorial Workgroup - Working with Forks
PascalDesmarets1
 
Tutorial Advanced How-To - Oracle 23c Duality views
Tutorial Advanced How-To - Oracle 23c Duality views
PascalDesmarets1
 
Tutorial Expert How-To - Docker-based automation
Tutorial Expert How-To - Docker-based automation
PascalDesmarets1
 
Tutorial Getting Started part 4 - Domain-Driven Data Modeling
Tutorial Getting Started part 4 - Domain-Driven Data Modeling
PascalDesmarets1
 
Tutorial Getting Started part 3 - Metadata-as-Code
Tutorial Getting Started part 3 - Metadata-as-Code
PascalDesmarets1
 
Tutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaboration
PascalDesmarets1
 
Tutorial Expert How-To - Naming Conventions
Tutorial Expert How-To - Naming Conventions
PascalDesmarets1
 
Tutorial Expert How-To - Export-Import with Excel template
Tutorial Expert How-To - Export-Import with Excel template
PascalDesmarets1
 
Tutorial Expert How-To - Compare and Merge
Tutorial Expert How-To - Compare and Merge
PascalDesmarets1
 
Tutorial Expert How-To - Custom properties
Tutorial Expert How-To - Custom properties
PascalDesmarets1
 
Hackolade Tutorial - part 13 - Leverage a Polyglot data model
Hackolade Tutorial - part 13 - Leverage a Polyglot data model
PascalDesmarets1
 
Hackolade Tutorial - part 12 - Create a REST API model
Hackolade Tutorial - part 12 - Create a REST API model
PascalDesmarets1
 
Hackolade Tutorial - part 8 - Import or reverse-engineer.pdf
Hackolade Tutorial - part 8 - Import or reverse-engineer.pdf
PascalDesmarets1
 
Hackolade Tutorial - part 4 - Create your first data model
Hackolade Tutorial - part 4 - Create your first data model
PascalDesmarets1
 
Hackolade Tutorial - part 1 - What is a data model
Hackolade Tutorial - part 1 - What is a data model
PascalDesmarets1
 
Ad

Recently uploaded (20)

Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
June Patch Tuesday
June Patch Tuesday
Ivanti
 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
June Patch Tuesday
June Patch Tuesday
Ivanti
 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
Ad

Tutorial Expert How-To - Create a model for Avro schemas

  • 1. Hackolade Tutorial Visual design of Avro schema Copyright © 2016-2023 Hackolade 1
  • 2. About Apache Avro • Open-source project • Provides • Data serialisation framework • Data exchange services • Often used with • Kafka pub-sub pipelines • Data lake storage • Row-oriented object container • as opposed to Parquet which is column-oriented • Language independent, compact and efficient Copyright © 2016-2023 Hackolade 2
  • 3. Components of an Avro Schema Model • An Avro schema can be viewed as a language- agnostic contract for systems to interoperate. • 4 attributes: • Type: specifies the data type of the JSON record, whether its complex type or primitive value. At the top level of an Avro schema, it is mandatory to have a “record” type. • Name: the name of the Avro schema being defined • Namespace: a high-level logical indicator of the Avro schema • Fields: the individual data elements of the JSON object. Fields can be of primitive as well as complex type, which can be further made of simple and complex data types. Copyright © 2016-2023 Hackolade 3
  • 4. About Apache Avro • Uses JSON to define schema and data types • Warning: Avro schema is quite different from JSON Schema, even if schema is defined using JSON format • Container file has header containing the schema, plus 1 or more storage blocks Copyright © 2016-2023 Hackolade 4
  • 5. Avro schema evolution • Avro allows for powerful schema evolution • Can achieve backward- and forward-compatibility if done well • Super interesting for “integration” services where many producers/consumers may evolve applications using different versions of the schema • Schemas and their evolving versions can be published in a schema registry • Goal is to maximise compatibility when decoupling the lifecycle of publishers and consumers • Specific guidelines and best practices Copyright © 2016-2023 Hackolade 5
  • 6. Benefits of Avro schema design in Hackolade Studio • Visual design of Avro schema • Anticipate future schema evolution • Facilitate message validation • Optimize message payload • Enable compatibility and interoperability • Integrate with Confluent Schema Registry and others • Promote schema reuse (despite limited official specification and documentation) via • Avro namespace references • Confluent Schema Registry’s • Schema references • Union schemas Copyright © 2016-2023 Hackolade 6
  • 7. Hackolade Studio support for Avro schema • Graphical Avro schema design tool • Can also import existing Avro schemas • Integrates with the schema registries (Confluent Schema Registry, Azure EventHubs Schema Registry, Pulsar Schema Registry, other CSR API-compatible registries) • For forward- and reverse-engineering of the Avro schema with these registries • Hackolade also integrates with Object Storage providers (S3, ADLS, …) for data lakes • Hackolade maintains (recursive) references between elements of the Avro schema Copyright © 2016-2023 Hackolade 7
  • 8. Hackolade Studio support for Avro schema • Outputs of Avro Schema modeling in Hackolade Studio: • Entity-Relationship Diagrams of multi-record/event environments • Hierarchical view of nested objects • Syntactically correct schema generation for • users without technical knowledge • developers wanting to improve quality and productivity • Documentation of schema in different formats • Conversion to/from other targets (OpenAPI, RDBMS, NoSQL) via Polyglot Data Modeling Copyright © 2016-2023 Hackolade 8
  • 9. Hackolade Studio and Confluent Schema Registry • Central repository with RESTful interface • Developers publish schemas for • Versioning • Safe schema evolution • Enhanced integrity • Data discoverability • Self-hosted or SaaS • Hackolade Studio native integration • can forward/reverse engineer Avro schemas with the registry • including support for the different Subject Name Strategies • supports namespace references, schema references, and union schemas Copyright © 2016-2023 Hackolade 9
  • 10. Reading material • See Hackolade Studio online documentation • The Hackolade Blog • These excellent new books: • MongoDB Data Modeling & Schema Design • Many of the principles in the book are related to query-driven modeling based on access patterns • Neo4j Data Modeling • Hackolade’s on social media: LinkedIn page, Twitter page • Download Hackolade Studio for free Copyright © 2016-2023 Hackolade 10