SlideShare a Scribd company logo
Kaitai Struct for binary data format parsing
2021
2
www.luxoft.co
m
● [Alice] I created new binary format! here
is C code to parse and construct it!
● [Bob] Do you have Python binding?
● [Charly] What about Java?
● [Dave] Oh I would like to use it in my PHP
web application.
● [Eve] I can’t use it for iOS development,
there are no Objective-C and Swift
utilities
● But, but, but … C - bindings …. Okay!!
Problem:
3
www.luxoft.co
m
Declarative and imperative
Here is my protocol description; it's so easy to understand. Why can’t you
create a
binding for your own language?
Okay, here is declarative description:
4
www.luxoft.co
m
Protocol description is not standardized
5
www.luxoft.co
m
Need to support several bindings for each file format.
Data Struct
C
C++
Python
Java
Go
Rust
...
Binary Data
PNG
ICMP
MP3
TCP
CAN
ZIP
...
N x N
bindings
6
www.luxoft.co
m
One file to rule them all
7
www.luxoft.co
m
Was it resolved before? XML or json?
8
www.luxoft.co
m
One file to rule them all
9
www.luxoft.co
m
Katai Struct benefits
➔ Documented and supported https://doc.kaitai.io/
➔ Bindings for a bunch of languages: C++/STL, C#, Go, Java, JavaScript,
Lua, Nim, Perl, PHP, Python, Ruby
➔ Awesome tools for debug and visualize https://github.com/kaitai-
io/awesome-kaitai
➔ Cross-platform
➔ Open-source https://github.com/kaitai-io
➔ Modern and sexy
➔ Already tested parsing and constructing code.
➔ Most common formats are already described (from fonts to
filesystems) https://formats.kaitai.io/
10
www.luxoft.co
m
KSY file format
● meta: Contains metadata about the target binary format we
are parsing such as identifiers or the default endianness
● seq: Describes an ordered sequence of elements (attributes)
such as the element identifier, type, and size (or literal
contents, e.g., magic numbers).
● enum: Maps integer constants to symbolic names for clarity,
which can then be referenced using the enum key.
● type: Declares user-defined named types, each of which can
contain any of the elements above, including other type
elements.
11
www.luxoft.co
m
Bitcoin transaction parsing
12
www.luxoft.co
m
Protobuf file parsing example:
Protobuf file
13
www.luxoft.co
m
Protobuf file parsing example:
Protobuf file
14
www.luxoft.co
m
Format investigation: Web IDE
15
www.luxoft.co
m
IDL meta section
16
www.luxoft.co
m
IDL meta section
17
www.luxoft.co
m
IDL seq section
18
www.luxoft.co
m
IDL seq section: Variable length
19
www.luxoft.co
m
Custom types
20
www.luxoft.co
m
Enums, conditions, instances
Explore generated code (optional)
22
www.luxoft.co
m
Summary
Benefits:
▪ Open-source
▪ Great tooling
▪ A lot of bindings
▪ Many binary formats already implemented
Drawbacks:
▪ Implemented in Scala, but doesn't always work out-of-the-box
▪ Current version is 0.8 (not stable enough)
▪ Generation for some languages requires some workarounds
Thank You!

More Related Content

What's hot (20)

PDF
A compact bytecode format for JavaScriptCore
Tadeu Zagallo
 
PPTX
C++ Coroutines
Sumant Tambe
 
PDF
Protostar VM - Heap3
UTD Computer Security Group
 
PDF
Clojure+ClojureScript Webapps
Falko Riemenschneider
 
PDF
Timur Shemsedinov "Пишу на колбеках, а что... (Асинхронное программирование)"
OdessaJS Conf
 
PPTX
MessagePack - An efficient binary serialization format
Larry Nung
 
PDF
Yurii Shevtsov "V8 + libuv = Node.js. Under the hood"
OdessaJS Conf
 
PPTX
Making a Process (Virtualizing Memory)
David Evans
 
PDF
Pwrake: Distributed Workflow Engine for e-Science - RubyConfX
Masahiro Tanaka
 
PDF
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Anne Nicolas
 
PDF
Meetup C++ A brief overview of c++17
Daniel Eriksson
 
PPTX
Mixing C++ & Python II: Pybind11
corehard_by
 
PDF
Overloading Perl OPs using XS
ℕicolas ℝ.
 
PDF
Teaching PostgreSQL to new people
Tomek Borek
 
PDF
When RegEx is not enough
Nati Cohen
 
DOCX
บทที่ 4 กระบวนการ
Champ Phinning
 
PPTX
Computer Science Homework Help
Programming Homework Help
 
PDF
Plebeia, a new storage for Tezos blockchain state
Jun Furuse
 
PDF
PyCon KR 2019 sprint - RustPython by example
YunWon Jeong
 
PPTX
Scheduling in Linux and Web Servers
David Evans
 
A compact bytecode format for JavaScriptCore
Tadeu Zagallo
 
C++ Coroutines
Sumant Tambe
 
Protostar VM - Heap3
UTD Computer Security Group
 
Clojure+ClojureScript Webapps
Falko Riemenschneider
 
Timur Shemsedinov "Пишу на колбеках, а что... (Асинхронное программирование)"
OdessaJS Conf
 
MessagePack - An efficient binary serialization format
Larry Nung
 
Yurii Shevtsov "V8 + libuv = Node.js. Under the hood"
OdessaJS Conf
 
Making a Process (Virtualizing Memory)
David Evans
 
Pwrake: Distributed Workflow Engine for e-Science - RubyConfX
Masahiro Tanaka
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Anne Nicolas
 
Meetup C++ A brief overview of c++17
Daniel Eriksson
 
Mixing C++ & Python II: Pybind11
corehard_by
 
Overloading Perl OPs using XS
ℕicolas ℝ.
 
Teaching PostgreSQL to new people
Tomek Borek
 
When RegEx is not enough
Nati Cohen
 
บทที่ 4 กระบวนการ
Champ Phinning
 
Computer Science Homework Help
Programming Homework Help
 
Plebeia, a new storage for Tezos blockchain state
Jun Furuse
 
PyCon KR 2019 sprint - RustPython by example
YunWon Jeong
 
Scheduling in Linux and Web Servers
David Evans
 

Similar to Oleksandr Kutsan "Using katai struct to describe the process of working with binary data formats" (20)

PDF
Lecture 9 - DSA - Python Data Structures
Haitham El-Ghareeb
 
PDF
Pigaios: A Tool for Diffing Source Codes against Binaries (Hacktivity 2018)
Joxean Koret
 
PPTX
Kostiantyn Grygoriev "Wrapping C++ for Python"
LogeekNightUkraine
 
PDF
PyCon2022 - Building Python Extensions
Henry Schreiner
 
PDF
ITB2024 - Keynote Day 1 - Ortus Solutions.pdf
Ortus Solutions, Corp
 
PDF
Gems in the python standard library
jasonscheirer
 
PDF
Programming Languages #devcon2013
Iván Montes
 
PDF
What's the best GUI library for Python.pdf
OnGraph Technologies
 
PDF
PyDataStructs Tech Share at Quansight
Gagandeep Singh
 
PDF
High Level Application Scripting With EFL and LuaJIT
Samsung Open Source Group
 
ODP
C Types - Extending Python
Priyank Kapadia
 
PDF
Python: The Programmer's Lingua Franca
ActiveState
 
PDF
Scalax
Martin Odersky
 
PDF
Data Serialization Using Google Protocol Buffers
William Kibira
 
PPTX
2016 bioinformatics i_python_part_1_wim_vancriekinge
Prof. Wim Van Criekinge
 
PDF
RAPID - Building a highly usable API Design language with XText
Ted Epstein
 
PDF
RepreZen DSL: Pushing the limits of language usability with XText
Tatiana Tanya Fesenko
 
PDF
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
pythoncharmers
 
PDF
Cython compiler
Tanikella Sai Abhijyan
 
PDF
Web development with Lua: Introducing Sailor an MVC web framework @ CodingSer...
Etiene Dalcol
 
Lecture 9 - DSA - Python Data Structures
Haitham El-Ghareeb
 
Pigaios: A Tool for Diffing Source Codes against Binaries (Hacktivity 2018)
Joxean Koret
 
Kostiantyn Grygoriev "Wrapping C++ for Python"
LogeekNightUkraine
 
PyCon2022 - Building Python Extensions
Henry Schreiner
 
ITB2024 - Keynote Day 1 - Ortus Solutions.pdf
Ortus Solutions, Corp
 
Gems in the python standard library
jasonscheirer
 
Programming Languages #devcon2013
Iván Montes
 
What's the best GUI library for Python.pdf
OnGraph Technologies
 
PyDataStructs Tech Share at Quansight
Gagandeep Singh
 
High Level Application Scripting With EFL and LuaJIT
Samsung Open Source Group
 
C Types - Extending Python
Priyank Kapadia
 
Python: The Programmer's Lingua Franca
ActiveState
 
Data Serialization Using Google Protocol Buffers
William Kibira
 
2016 bioinformatics i_python_part_1_wim_vancriekinge
Prof. Wim Van Criekinge
 
RAPID - Building a highly usable API Design language with XText
Ted Epstein
 
RepreZen DSL: Pushing the limits of language usability with XText
Tatiana Tanya Fesenko
 
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
pythoncharmers
 
Cython compiler
Tanikella Sai Abhijyan
 
Web development with Lua: Introducing Sailor an MVC web framework @ CodingSer...
Etiene Dalcol
 
Ad

More from LogeekNightUkraine (20)

PPTX
Face recognition with c++
LogeekNightUkraine
 
PPTX
C++20 features
LogeekNightUkraine
 
PPTX
Autonomous driving on your developer pc. technologies, approaches, future
LogeekNightUkraine
 
PDF
Orkhan Gasimov "High Performance System Design"
LogeekNightUkraine
 
PPTX
Vitalii Korzh "Managed Workflows or How to Master Data"
LogeekNightUkraine
 
PDF
Oleksii Kuchuk "Reading gauge values with open cv imgproc"
LogeekNightUkraine
 
PDF
Pavlo Zhdanov "Mastering solid and base principles for software design"
LogeekNightUkraine
 
PDF
Serhii Zemlianyi "Error Retries with Exponential Backoff Using RabbitMQ"
LogeekNightUkraine
 
PDF
Iurii Antykhovych "Java and performance tools and toys"
LogeekNightUkraine
 
PDF
Eugene Bova "Dapr (Distributed Application Runtime) in a Microservices Archit...
LogeekNightUkraine
 
PPTX
Aleksandr Kutsan "Managing Dependencies in C++"
LogeekNightUkraine
 
PDF
Alexandr Golyak, Nikolay Chertkov "Automotive Testing vs Test Automatio"
LogeekNightUkraine
 
PPTX
Michal Kordas "Docker: Good, Bad or Both"
LogeekNightUkraine
 
PPTX
Kolomiyets Dmytro "Dealing with Multiple Caches, When Developing Microservices"
LogeekNightUkraine
 
PPTX
Shestakov Illia "The Sandbox Theory"
LogeekNightUkraine
 
PPTX
Dmytro Kochergin “Autotest with CYPRESS”
LogeekNightUkraine
 
PPTX
Ivan Dryzhyruk “Ducks Don’t Like Bugs”
LogeekNightUkraine
 
PDF
Nhu Viet Nguyen "Why C++ is Becoming a Necessity for QA Automation"
LogeekNightUkraine
 
PDF
Dmytro Safonov "Open-Source Map Viewers"
LogeekNightUkraine
 
PPTX
Serhii Matynenko "How to Deal with Logs, Migrating from Monolith Architecture...
LogeekNightUkraine
 
Face recognition with c++
LogeekNightUkraine
 
C++20 features
LogeekNightUkraine
 
Autonomous driving on your developer pc. technologies, approaches, future
LogeekNightUkraine
 
Orkhan Gasimov "High Performance System Design"
LogeekNightUkraine
 
Vitalii Korzh "Managed Workflows or How to Master Data"
LogeekNightUkraine
 
Oleksii Kuchuk "Reading gauge values with open cv imgproc"
LogeekNightUkraine
 
Pavlo Zhdanov "Mastering solid and base principles for software design"
LogeekNightUkraine
 
Serhii Zemlianyi "Error Retries with Exponential Backoff Using RabbitMQ"
LogeekNightUkraine
 
Iurii Antykhovych "Java and performance tools and toys"
LogeekNightUkraine
 
Eugene Bova "Dapr (Distributed Application Runtime) in a Microservices Archit...
LogeekNightUkraine
 
Aleksandr Kutsan "Managing Dependencies in C++"
LogeekNightUkraine
 
Alexandr Golyak, Nikolay Chertkov "Automotive Testing vs Test Automatio"
LogeekNightUkraine
 
Michal Kordas "Docker: Good, Bad or Both"
LogeekNightUkraine
 
Kolomiyets Dmytro "Dealing with Multiple Caches, When Developing Microservices"
LogeekNightUkraine
 
Shestakov Illia "The Sandbox Theory"
LogeekNightUkraine
 
Dmytro Kochergin “Autotest with CYPRESS”
LogeekNightUkraine
 
Ivan Dryzhyruk “Ducks Don’t Like Bugs”
LogeekNightUkraine
 
Nhu Viet Nguyen "Why C++ is Becoming a Necessity for QA Automation"
LogeekNightUkraine
 
Dmytro Safonov "Open-Source Map Viewers"
LogeekNightUkraine
 
Serhii Matynenko "How to Deal with Logs, Migrating from Monolith Architecture...
LogeekNightUkraine
 
Ad

Recently uploaded (20)

PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 

Oleksandr Kutsan "Using katai struct to describe the process of working with binary data formats"