SlideShare a Scribd company logo
Building a Streaming Microservice Architecture: with Apache Spark Structured Streaming and Friends
Building a Streaming Microservice
Architecture: With Spark
Structured Streaming and Friends
Scott Haines
Senior Principal Software Engineer
Introductions
▪ I work at Twilio
▪ Over 10 years working on Streaming
Architectures
▪ Helped Bring Streaming-First Spark Architecture
to Voice & Voice Insights
▪ Leads Spark Office Hours @ Twilio
▪ Loves Distributed Systems
About Me
Scott Haines: Senior Principal Software Engineer @newfront
Agenda
The Big Picture
What the Architecture looks like
Protocol Buffers
What they are. Why they rule!
GRPC / Protocol Streams
Versioned Data Lineage as a Service
How this fits into Spark
Structured Streaming with Protobuf support
The Big Picture
Streaming Microservice Architecture
GRPC Client
GRPC Server GRPC Server GRPC Server
1
2
3
Kafka Broker
4
Kafka Broker
5
6
Spark Application
7 8
HDFS
S39
HTTP /2
Streaming Microservice Architecture
Kafka Topic Kafka Topic
Spark Application Spark Application Spark Application
Kafka Topic
Data Table Data Table
Spark Application
GRPC Server
Protocol Buffers aka protobuf
Protocol Buffers
▪ Strict Types
▪ Enforce structure at compile time
▪ Similar to StructType in Apache Spark
▪ Interoperable with Spark via ExpressionEncoding extension
▪ Versioning API / Data Pipeline
▪ Compiled protobuf (*.proto) can be released like normal code
▪ Interoperable
▪ Pick your favorite programming language and compile and release.
▪ Supports Java, Scala, C++, Go, Obj-C, Node-JS, Python and more
Why use them?
Protocol Buffers
▪ Code Gen
▪ Automatically generate Builder classes
▪ Being lazy is okay!
▪ Optimized
▪ Messages are optimized and ship with their own
Serialization/Deserialization mechanics (SerDe)
Why use them?
GRPC and Protocol Streams
gRPC
▪ High Performance
▪ Compact Binary Exchange Format
▪ Make API Calls to the Server like they were Client local
▪ Cross Language/Cross Platform
▪ Autogenerate API definitions for idiomatic client and server – just
implement the interfaces
▪ Bi-Directional Streaming
▪ Pluggable support for streaming with HTTP/2 transport
What is it?
GRPC Client
GRPC Server GRPC Server GRPC Server
HTTP /2
GRPC Example: AdTracking
GRPC
▪ Define Messages
▪ What kind of Data are your sending?
▪ Example: Click Tracking / Impression Tracking
▪ What is necessary for the public interface?
▪ Example: AdImpression and Response
How it works?
GRPC
▪ Service Definition
▪ Compile your rpc definition to generate Service Interfaces
▪ Uses the Same protobuf definition (service.proto) as your
Client/Server request and response objects
▪ Can be used to create a binding Service Contract within your
organization or publicly
How it works?
GRPC
▪ Implement the Service
▪ Compilation of the Service auto-generates your
interfaces.
▪ Just implement the service contracts.
How it works?
GRPC
▪ Protocol Streams
▪ Messages (protobuf) are emitted to Kafka topic(s)
from the Server Layer
▪ Protocol Streams are now available from the Kafka
Topics bound to a given Service / Collection of
Messages
▪ Sets up Spark for the Hand-Off
How it works?
GRPC
System Architecture
GRPC Client
GRPC Server GRPC Server GRPC Server
Kafka Broker
Kafka Broker
6
HTTP /2
Topic: ads.click.stream
Client: service.adTrack(trackedAd)
Server: ClickTrackService.adTrack(trackedAd)
Structuring Protocol Streams:
with Structured Streaming
and protobuf
Structured Streaming with Protobuf
▪ Expression Encoding
▪ Natively Interop with Protobuf in Apache Spark.
▪ Protobuf to Case Class conversion from
scalapb.
▪ Product encoding comes for free via import
sparkSession.implicits._
From Protocol Buffer to StructType through ExpressionEncoders
Structured Streaming with Protobuf
▪ Native is Better
▪ Strict Native Kafka to DataFrame conversion with no need
for transformation to intermediary types
▪ Mutations and Joins can be done across DataFrame or
Datasets API.
▪ Create RealTime Data Pipelines, Machine Learning
Pipelines and More.
▪ Rest at Night knowing the pipelines are safe!
From Protocol Buffer to StructType through ExpressionEncoders
Structured Streaming with Protobuf
▪ Strict Data Writer
▪ Compiled / Versioned Protobuf can be used to strictly
enforce the format of your Writers even
▪ Use Protobuf to define the StructType that can be used in
your conversions to *Parquet. (* must abide by parquet
nesting rules )
▪ Declarative Input / Output means that Streaming
Applications don’t go down due to incompatible Data
Streams
▪ Can also be used with Delta so that the version of the
schema lines up with compiled Protobuf.
From Protocol Buffer to StructType through ExpressionEncoders
Structured Streaming with Protobuf
▪ Real World Use Case
▪ Close of Books Data Lineage Job
▪ Uses End to End Protobuf
▪ Enables teams to move quick with guarantees regarding
the Data being published and at what Frequency
▪ Can be emitted at different speeds to different locations
based on configuration
Example: Streaming Transformation Pipeline
Streaming Microservice Architecture
GRPC Client
GRPC Server GRPC Server GRPC Server
1
2
3
Kafka Broker
4
Kafka Broker
5
6
Spark Application
7 8
HDFS
S39
HTTP /2
Recap
What We Learned
▪ Language
Agnostic
Structured Data
▪ Compile Time
Guarantees
▪ Lightning Fast
Serialization/Dese
rialization
▪ Language
Agnostic Binary
Services
▪ Low-Latency
▪ Compile Time
Guarantees
▪ Smart Framework
GRPCProtobuf
▪ Highly Available
▪ Native Connector
for Spark
▪ Topic Based Binary
Protobuf Store
▪ Use to Pass
Records to one or
more Downstream
Services
Kafka
▪ Handle Data
Reliably
▪ Protobuf to
Dataset /
DataFrames is
awesome
▪ Parquet / Delta
plays nice as
Columnar Data
Exchange format
Structured Streaming
Thanks @newfrontcreative
@newfront
Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.

More Related Content

What's hot (20)

PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
 
PDF
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
PPTX
Migrating with Debezium
Mike Fowler
 
PDF
Facebook Presto presentation
Cyanny LIANG
 
PPTX
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
PDF
ksqlDB: A Stream-Relational Database System
confluent
 
PDF
Building robust CDC pipeline with Apache Hudi and Debezium
Tathastu.ai
 
PDF
The Parquet Format and Performance Optimization Opportunities
Databricks
 
PDF
Designing Structured Streaming Pipelines—How to Architect Things Right
Databricks
 
PPTX
Introduction to Apache Flink
mxmxm
 
PDF
Apache Spark Core – Practical Optimization
Databricks
 
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
PDF
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
PDF
A Deep Dive into Kafka Controller
confluent
 
PDF
Cassandra Introduction & Features
DataStax Academy
 
PDF
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
PDF
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
PDF
Polyglot persistence @ netflix (CDE Meetup)
Roopa Tangirala
 
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
Migrating with Debezium
Mike Fowler
 
Facebook Presto presentation
Cyanny LIANG
 
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
ksqlDB: A Stream-Relational Database System
confluent
 
Building robust CDC pipeline with Apache Hudi and Debezium
Tathastu.ai
 
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Databricks
 
Introduction to Apache Flink
mxmxm
 
Apache Spark Core – Practical Optimization
Databricks
 
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
A Deep Dive into Kafka Controller
confluent
 
Cassandra Introduction & Features
DataStax Academy
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Polyglot persistence @ netflix (CDE Meetup)
Roopa Tangirala
 
Fundamentals of Apache Kafka
Chhavi Parasher
 

Similar to Building a Streaming Microservice Architecture: with Apache Spark Structured Streaming and Friends (20)

PDF
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Databricks
 
PDF
REST in Peace. Long live gRPC!
QAware GmbH
 
PPTX
CocoaConf: The Language of Mobile Software is APIs
Tim Burks
 
PDF
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
apidays
 
PPTX
Building your First gRPC Service
Jessie Barnett
 
PDF
Power-up services with gRPC
The Software House
 
PDF
Inter-Process Communication in Microservices using gRPC
Shiju Varghese
 
PDF
Implementing OpenAPI and GraphQL services with gRPC
Tim Burks
 
PDF
Fast and Reliable Swift APIs with gRPC
Tim Burks
 
PPTX
Building API Using GRPC And Scala
Knoldus Inc.
 
PPTX
The Right Kind of API – How To Choose Appropriate API Protocols and Data Form...
Nordic APIs
 
PPTX
Introduction to gRPC. Advantages and Disadvantages
abdulrehmanlatif65
 
PDF
Building Language Agnostic APIs with gRPC - JavaDay Istanbul 2017
Mustafa AKIN
 
PPTX
Akka gRPC Essentials A Hands-On Introduction
Knoldus Inc.
 
PPTX
What I learned about APIs in my first year at Google
Tim Burks
 
PDF
Building Microservices with gRPC and NATS
Shiju Varghese
 
PPTX
Mcroservices with docker kubernetes, goang and grpc, overview
Faculty of Technical Sciences, University of Novi Sad
 
PDF
Robert Kubis - gRPC - boilerplate to high-performance scalable APIs - code.t...
AboutYouGmbH
 
PDF
Cloud native IPC for Microservices Workshop @ Containerdays 2022
QAware GmbH
 
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Databricks
 
REST in Peace. Long live gRPC!
QAware GmbH
 
CocoaConf: The Language of Mobile Software is APIs
Tim Burks
 
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
apidays
 
Building your First gRPC Service
Jessie Barnett
 
Power-up services with gRPC
The Software House
 
Inter-Process Communication in Microservices using gRPC
Shiju Varghese
 
Implementing OpenAPI and GraphQL services with gRPC
Tim Burks
 
Fast and Reliable Swift APIs with gRPC
Tim Burks
 
Building API Using GRPC And Scala
Knoldus Inc.
 
The Right Kind of API – How To Choose Appropriate API Protocols and Data Form...
Nordic APIs
 
Introduction to gRPC. Advantages and Disadvantages
abdulrehmanlatif65
 
Building Language Agnostic APIs with gRPC - JavaDay Istanbul 2017
Mustafa AKIN
 
Akka gRPC Essentials A Hands-On Introduction
Knoldus Inc.
 
What I learned about APIs in my first year at Google
Tim Burks
 
Building Microservices with gRPC and NATS
Shiju Varghese
 
Mcroservices with docker kubernetes, goang and grpc, overview
Faculty of Technical Sciences, University of Novi Sad
 
Robert Kubis - gRPC - boilerplate to high-performance scalable APIs - code.t...
AboutYouGmbH
 
Cloud native IPC for Microservices Workshop @ Containerdays 2022
QAware GmbH
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
BinarySearchTree in datastructures in detail
kichokuttu
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 

Building a Streaming Microservice Architecture: with Apache Spark Structured Streaming and Friends

  • 2. Building a Streaming Microservice Architecture: With Spark Structured Streaming and Friends Scott Haines Senior Principal Software Engineer
  • 3. Introductions ▪ I work at Twilio ▪ Over 10 years working on Streaming Architectures ▪ Helped Bring Streaming-First Spark Architecture to Voice & Voice Insights ▪ Leads Spark Office Hours @ Twilio ▪ Loves Distributed Systems About Me Scott Haines: Senior Principal Software Engineer @newfront
  • 4. Agenda The Big Picture What the Architecture looks like Protocol Buffers What they are. Why they rule! GRPC / Protocol Streams Versioned Data Lineage as a Service How this fits into Spark Structured Streaming with Protobuf support
  • 6. Streaming Microservice Architecture GRPC Client GRPC Server GRPC Server GRPC Server 1 2 3 Kafka Broker 4 Kafka Broker 5 6 Spark Application 7 8 HDFS S39 HTTP /2
  • 7. Streaming Microservice Architecture Kafka Topic Kafka Topic Spark Application Spark Application Spark Application Kafka Topic Data Table Data Table Spark Application GRPC Server
  • 9. Protocol Buffers ▪ Strict Types ▪ Enforce structure at compile time ▪ Similar to StructType in Apache Spark ▪ Interoperable with Spark via ExpressionEncoding extension ▪ Versioning API / Data Pipeline ▪ Compiled protobuf (*.proto) can be released like normal code ▪ Interoperable ▪ Pick your favorite programming language and compile and release. ▪ Supports Java, Scala, C++, Go, Obj-C, Node-JS, Python and more Why use them?
  • 10. Protocol Buffers ▪ Code Gen ▪ Automatically generate Builder classes ▪ Being lazy is okay! ▪ Optimized ▪ Messages are optimized and ship with their own Serialization/Deserialization mechanics (SerDe) Why use them?
  • 11. GRPC and Protocol Streams
  • 12. gRPC ▪ High Performance ▪ Compact Binary Exchange Format ▪ Make API Calls to the Server like they were Client local ▪ Cross Language/Cross Platform ▪ Autogenerate API definitions for idiomatic client and server – just implement the interfaces ▪ Bi-Directional Streaming ▪ Pluggable support for streaming with HTTP/2 transport What is it? GRPC Client GRPC Server GRPC Server GRPC Server HTTP /2
  • 14. GRPC ▪ Define Messages ▪ What kind of Data are your sending? ▪ Example: Click Tracking / Impression Tracking ▪ What is necessary for the public interface? ▪ Example: AdImpression and Response How it works?
  • 15. GRPC ▪ Service Definition ▪ Compile your rpc definition to generate Service Interfaces ▪ Uses the Same protobuf definition (service.proto) as your Client/Server request and response objects ▪ Can be used to create a binding Service Contract within your organization or publicly How it works?
  • 16. GRPC ▪ Implement the Service ▪ Compilation of the Service auto-generates your interfaces. ▪ Just implement the service contracts. How it works?
  • 17. GRPC ▪ Protocol Streams ▪ Messages (protobuf) are emitted to Kafka topic(s) from the Server Layer ▪ Protocol Streams are now available from the Kafka Topics bound to a given Service / Collection of Messages ▪ Sets up Spark for the Hand-Off How it works?
  • 18. GRPC System Architecture GRPC Client GRPC Server GRPC Server GRPC Server Kafka Broker Kafka Broker 6 HTTP /2 Topic: ads.click.stream Client: service.adTrack(trackedAd) Server: ClickTrackService.adTrack(trackedAd)
  • 19. Structuring Protocol Streams: with Structured Streaming and protobuf
  • 20. Structured Streaming with Protobuf ▪ Expression Encoding ▪ Natively Interop with Protobuf in Apache Spark. ▪ Protobuf to Case Class conversion from scalapb. ▪ Product encoding comes for free via import sparkSession.implicits._ From Protocol Buffer to StructType through ExpressionEncoders
  • 21. Structured Streaming with Protobuf ▪ Native is Better ▪ Strict Native Kafka to DataFrame conversion with no need for transformation to intermediary types ▪ Mutations and Joins can be done across DataFrame or Datasets API. ▪ Create RealTime Data Pipelines, Machine Learning Pipelines and More. ▪ Rest at Night knowing the pipelines are safe! From Protocol Buffer to StructType through ExpressionEncoders
  • 22. Structured Streaming with Protobuf ▪ Strict Data Writer ▪ Compiled / Versioned Protobuf can be used to strictly enforce the format of your Writers even ▪ Use Protobuf to define the StructType that can be used in your conversions to *Parquet. (* must abide by parquet nesting rules ) ▪ Declarative Input / Output means that Streaming Applications don’t go down due to incompatible Data Streams ▪ Can also be used with Delta so that the version of the schema lines up with compiled Protobuf. From Protocol Buffer to StructType through ExpressionEncoders
  • 23. Structured Streaming with Protobuf ▪ Real World Use Case ▪ Close of Books Data Lineage Job ▪ Uses End to End Protobuf ▪ Enables teams to move quick with guarantees regarding the Data being published and at what Frequency ▪ Can be emitted at different speeds to different locations based on configuration Example: Streaming Transformation Pipeline
  • 24. Streaming Microservice Architecture GRPC Client GRPC Server GRPC Server GRPC Server 1 2 3 Kafka Broker 4 Kafka Broker 5 6 Spark Application 7 8 HDFS S39 HTTP /2
  • 25. Recap
  • 26. What We Learned ▪ Language Agnostic Structured Data ▪ Compile Time Guarantees ▪ Lightning Fast Serialization/Dese rialization ▪ Language Agnostic Binary Services ▪ Low-Latency ▪ Compile Time Guarantees ▪ Smart Framework GRPCProtobuf ▪ Highly Available ▪ Native Connector for Spark ▪ Topic Based Binary Protobuf Store ▪ Use to Pass Records to one or more Downstream Services Kafka ▪ Handle Data Reliably ▪ Protobuf to Dataset / DataFrames is awesome ▪ Parquet / Delta plays nice as Columnar Data Exchange format Structured Streaming
  • 28. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.