SlideShare a Scribd company logo
Apache Arrow
Apache Arrow Flight
By Jacques Nadeau, PMC Apache Arrow
Apache Arrow
Why Arrow Flight: Arrow Promises Interoperability
• But it’s primary medium is in-memory
• Some work to support shared memory in-process
• But not all systems can be collocated
– Especially in a modern K8s/containerized deployment
• Shared memory has other problems:
– Reference management and security are complex
– Different requirements for long-term datasets versus
ephemeral datasets
Arrow Needs an RPC layer to simplify the creation of Data Applications
Apache Arrow
Arrow Messaging Paradigm: Batch Streams
Primary Communication:
• A Stream of Arrow Record
Batches
• Bulk transfer targeting efficient
movement
• Effectively Peer to Peer
Client Server
Put HeaderDataDataDataend
Thanks
endDataDataDataHeader
Get Descriptor
Specific Methods:
• Put Stream: Client sends a stream
to server
• Get Stream: Server sends a stream
to client
• Both Initiated by Client
Apache Arrow
Endpoint: Retrieved with Ticket
Flight
Location 1
Location 2
Arrow Messaging Paradigm: Stream Management
• Parallel consumption and locality awareness
– A flight is composed of streams
– Each stream has a FlightEndpoint: A opaque stream
ticket along with a consumption location
– Systems can take advantage of location information to
improve data locality
• Flights have two reference systems:
– Dotted path namespace for simple services (e.g.
marketing.yesterday.sales)
– Arbitrary binary command descriptor: (e.g. “select a,b
from foo where c > 10”)
• Support for Stream Listing
– ListFlights(Criteria)
– GetFlightInfo(FlightDescriptor)
Stream
Stream
Stream
Stream
Apache Arrow
Arrow Messaging Paradigm: Data as a Service Customization
• Arrow Flight Also support a simple Generic Messaging Framework
– Support Customization and Extensibility within the Arrow Flight context
• ListActions()
– Each Data Service can expose actions along with descriptions about what they support
– Each action should describe how to structure the action and corresponding result
– Normal HTTP2 exceptions can be used to manage error states
• DoAction(Action) => Result
– Generic Containers that can carry execute Data Service specific operations
– Examples might include: forget stream, load stream from disk,
• Actions and Results, each have:
– ActionType String token
– Body: JSON body of instruction
• Arrow Flight Clients can be written without knowledge of custom Actions/Results
– Lightweight wrappers can be built for Data Services as needed
– Or Simply use existing JSON tooling on top of generic API
Apache Arrow
But How? GRPC as a Foundation
• Generic RPC generation framework
• Built on HTTP/2 Standard
• Many language bindings (see right)
• Supports security &compression
• Uses Protobuf as primary format
• Designed primarily for application messaging
Apache Arrow
Extend GRPC To Better Work With Arrow Streams
• Streams are valid Protobuf Objects so systems that don’t
have custom processing can still consume Arrow streams
– The entirety of the Arrow RecordBatch is a single length
delimited Protobuf “bytes” field.
• For high performance situations, do direct byte encoding
and one-copy reads/zero-copy writes to avoid extra
copies/overhead
– Java Flight implementation cuts through multiple layers to
achieve this using currently released GRPC (despite no formal
support for it).
Apache Arrow
Check it out
• Arrow Flight Proposal
– https://github.com/jacques-n/arrow
• Example Usage in Dremio Formation
– https://github.com/jacques-n/formation
Ad

More Related Content

What's hot (20)

A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
Alluxio, Inc.
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
DataWorks Summit/Hadoop Summit
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
Dongmin Yu
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
Ilya Ganelin
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
GauravBiswas9
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
Andrew Lamb
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
ScyllaDB
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
Julien Le Dem
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
Alluxio, Inc.
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
Dongmin Yu
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
Ilya Ganelin
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
Andrew Lamb
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
ScyllaDB
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
Julien Le Dem
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
 

Similar to Apache Arrow Flight Overview (20)

Python WSGI introduction
Python WSGI introductionPython WSGI introduction
Python WSGI introduction
Ageeleshwar K
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
振东 刘
 
HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Future
rcastain
 
CHP-4.pptx
CHP-4.pptxCHP-4.pptx
CHP-4.pptx
FamiDan
 
2. RINA overview - TF workshop
2. RINA overview - TF workshop2. RINA overview - TF workshop
2. RINA overview - TF workshop
ARCFIRE ICT
 
Cs556 section3
Cs556 section3Cs556 section3
Cs556 section3
farshad33
 
Cs556 section3
Cs556 section3Cs556 section3
Cs556 section3
sehrish saba
 
Scalable Web Apps
Scalable Web AppsScalable Web Apps
Scalable Web Apps
Piotr Pelczar
 
lect4_SDNbasic_openflow.pptx
lect4_SDNbasic_openflow.pptxlect4_SDNbasic_openflow.pptx
lect4_SDNbasic_openflow.pptx
JesicaDcruz1
 
HTTP/2 Comes to Java: Servlet 4.0 and what it means for the Java/Jakarta EE e...
HTTP/2 Comes to Java: Servlet 4.0 and what it means for the Java/Jakarta EE e...HTTP/2 Comes to Java: Servlet 4.0 and what it means for the Java/Jakarta EE e...
HTTP/2 Comes to Java: Servlet 4.0 and what it means for the Java/Jakarta EE e...
Edward Burns
 
APITalkMeetupSharable
APITalkMeetupSharableAPITalkMeetupSharable
APITalkMeetupSharable
Obaidur (OB) Rashid
 
Building high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftBuilding high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache Thrift
RX-M Enterprises LLC
 
Module 1 part 2.pptx with clear notes and explanation
Module 1 part 2.pptx with clear notes and explanationModule 1 part 2.pptx with clear notes and explanation
Module 1 part 2.pptx with clear notes and explanation
farsankadavandy
 
Intro to web services
Intro to web servicesIntro to web services
Intro to web services
Neil Ghosh
 
Ead pertemuan-7
Ead pertemuan-7Ead pertemuan-7
Ead pertemuan-7
Yudha Arif Budiman
 
ONOS Platform Architecture
ONOS Platform ArchitectureONOS Platform Architecture
ONOS Platform Architecture
OpenDaylight
 
Module 5 Application and presentation Layer .pptx
Module 5 Application and presentation Layer .pptxModule 5 Application and presentation Layer .pptx
Module 5 Application and presentation Layer .pptx
AASTHAJAJOO
 
Software Defined Networking, Concepts and Practical Implementations
Software Defined Networking, Concepts and Practical ImplementationsSoftware Defined Networking, Concepts and Practical Implementations
Software Defined Networking, Concepts and Practical Implementations
Bangladesh Network Operators Group
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast Data
Naveen Korakoppa
 
Asynchronous Python with Twisted
Asynchronous Python with TwistedAsynchronous Python with Twisted
Asynchronous Python with Twisted
Adam Englander
 
Python WSGI introduction
Python WSGI introductionPython WSGI introduction
Python WSGI introduction
Ageeleshwar K
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
振东 刘
 
HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Future
rcastain
 
CHP-4.pptx
CHP-4.pptxCHP-4.pptx
CHP-4.pptx
FamiDan
 
2. RINA overview - TF workshop
2. RINA overview - TF workshop2. RINA overview - TF workshop
2. RINA overview - TF workshop
ARCFIRE ICT
 
Cs556 section3
Cs556 section3Cs556 section3
Cs556 section3
farshad33
 
lect4_SDNbasic_openflow.pptx
lect4_SDNbasic_openflow.pptxlect4_SDNbasic_openflow.pptx
lect4_SDNbasic_openflow.pptx
JesicaDcruz1
 
HTTP/2 Comes to Java: Servlet 4.0 and what it means for the Java/Jakarta EE e...
HTTP/2 Comes to Java: Servlet 4.0 and what it means for the Java/Jakarta EE e...HTTP/2 Comes to Java: Servlet 4.0 and what it means for the Java/Jakarta EE e...
HTTP/2 Comes to Java: Servlet 4.0 and what it means for the Java/Jakarta EE e...
Edward Burns
 
Building high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftBuilding high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache Thrift
RX-M Enterprises LLC
 
Module 1 part 2.pptx with clear notes and explanation
Module 1 part 2.pptx with clear notes and explanationModule 1 part 2.pptx with clear notes and explanation
Module 1 part 2.pptx with clear notes and explanation
farsankadavandy
 
Intro to web services
Intro to web servicesIntro to web services
Intro to web services
Neil Ghosh
 
ONOS Platform Architecture
ONOS Platform ArchitectureONOS Platform Architecture
ONOS Platform Architecture
OpenDaylight
 
Module 5 Application and presentation Layer .pptx
Module 5 Application and presentation Layer .pptxModule 5 Application and presentation Layer .pptx
Module 5 Application and presentation Layer .pptx
AASTHAJAJOO
 
Software Defined Networking, Concepts and Practical Implementations
Software Defined Networking, Concepts and Practical ImplementationsSoftware Defined Networking, Concepts and Practical Implementations
Software Defined Networking, Concepts and Practical Implementations
Bangladesh Network Operators Group
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast Data
Naveen Korakoppa
 
Asynchronous Python with Twisted
Asynchronous Python with TwistedAsynchronous Python with Twisted
Asynchronous Python with Twisted
Adam Englander
 
Ad

Recently uploaded (20)

Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Journal of Soft Computing in Civil Engineering
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
Value Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous SecurityValue Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous Security
Marc Hornbeek
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxLidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
RishavKumar530754
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
fluke dealers in bangalore..............
fluke dealers in bangalore..............fluke dealers in bangalore..............
fluke dealers in bangalore..............
Haresh Vaswani
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
Value Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous SecurityValue Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous Security
Marc Hornbeek
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxLidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
RishavKumar530754
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
fluke dealers in bangalore..............
fluke dealers in bangalore..............fluke dealers in bangalore..............
fluke dealers in bangalore..............
Haresh Vaswani
 
Ad

Apache Arrow Flight Overview

  • 1. Apache Arrow Apache Arrow Flight By Jacques Nadeau, PMC Apache Arrow
  • 2. Apache Arrow Why Arrow Flight: Arrow Promises Interoperability • But it’s primary medium is in-memory • Some work to support shared memory in-process • But not all systems can be collocated – Especially in a modern K8s/containerized deployment • Shared memory has other problems: – Reference management and security are complex – Different requirements for long-term datasets versus ephemeral datasets Arrow Needs an RPC layer to simplify the creation of Data Applications
  • 3. Apache Arrow Arrow Messaging Paradigm: Batch Streams Primary Communication: • A Stream of Arrow Record Batches • Bulk transfer targeting efficient movement • Effectively Peer to Peer Client Server Put HeaderDataDataDataend Thanks endDataDataDataHeader Get Descriptor Specific Methods: • Put Stream: Client sends a stream to server • Get Stream: Server sends a stream to client • Both Initiated by Client
  • 4. Apache Arrow Endpoint: Retrieved with Ticket Flight Location 1 Location 2 Arrow Messaging Paradigm: Stream Management • Parallel consumption and locality awareness – A flight is composed of streams – Each stream has a FlightEndpoint: A opaque stream ticket along with a consumption location – Systems can take advantage of location information to improve data locality • Flights have two reference systems: – Dotted path namespace for simple services (e.g. marketing.yesterday.sales) – Arbitrary binary command descriptor: (e.g. “select a,b from foo where c > 10”) • Support for Stream Listing – ListFlights(Criteria) – GetFlightInfo(FlightDescriptor) Stream Stream Stream Stream
  • 5. Apache Arrow Arrow Messaging Paradigm: Data as a Service Customization • Arrow Flight Also support a simple Generic Messaging Framework – Support Customization and Extensibility within the Arrow Flight context • ListActions() – Each Data Service can expose actions along with descriptions about what they support – Each action should describe how to structure the action and corresponding result – Normal HTTP2 exceptions can be used to manage error states • DoAction(Action) => Result – Generic Containers that can carry execute Data Service specific operations – Examples might include: forget stream, load stream from disk, • Actions and Results, each have: – ActionType String token – Body: JSON body of instruction • Arrow Flight Clients can be written without knowledge of custom Actions/Results – Lightweight wrappers can be built for Data Services as needed – Or Simply use existing JSON tooling on top of generic API
  • 6. Apache Arrow But How? GRPC as a Foundation • Generic RPC generation framework • Built on HTTP/2 Standard • Many language bindings (see right) • Supports security &compression • Uses Protobuf as primary format • Designed primarily for application messaging
  • 7. Apache Arrow Extend GRPC To Better Work With Arrow Streams • Streams are valid Protobuf Objects so systems that don’t have custom processing can still consume Arrow streams – The entirety of the Arrow RecordBatch is a single length delimited Protobuf “bytes” field. • For high performance situations, do direct byte encoding and one-copy reads/zero-copy writes to avoid extra copies/overhead – Java Flight implementation cuts through multiple layers to achieve this using currently released GRPC (despite no formal support for it).
  • 8. Apache Arrow Check it out • Arrow Flight Proposal – https://github.com/jacques-n/arrow • Example Usage in Dremio Formation – https://github.com/jacques-n/formation