Martin Decky Microkernels Capabilities
Martin Decky Microkernels Capabilities
Operating Systems
Martin Děcký
[email protected]
March 2021
About the Speaker
Charles University
Research scientist at D3S (2008 – 2017)
Graduated (Ph.D.) in 2015
Co-author of the HelenOS (http://www.helenos.org/) microkernel
multiserver operating system
Huawei Technologies
Senior Research Engineer, Munich Research Center (2017 – 2018)
Principal Research Engineer, Dresden Research Center (2019 – present)
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 2
Sweden RC Finland RC
UK RC
Ireland RC Belgium RC
Vancouver Edmonton Toronto Poland RC
Montreal Ukraine RC
France RC Germany, Austria, Switzerland RC
Ottawa Beijing
Israel RC
Waterloo Italy RC
Xi’anNanjing Japan RC
Chengdu Shanghai
Wuhan Suzhou
Songshanhu Hangzhou
HQ Shenzhen
India RC
Edinburgh Tampere
Cambridge Stockholm Helsinki
Singapore RC
Ipswich Goteburg
London Lund
Warsaw
Leuven Dresden
Dublin Nuremburg Kyiv
Paris Munich
City R&D Center
Lagrange Zurich Vienna
Grenoble Milan
Related Country Research Center Nice Pisa
City Research Center
Tel Aviv
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 3
Huawei Dresden Research Center (DRC)
Since 2019, ~20 employees (plus a virtualization team in Munich)
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 4
Huawei Dresden Research Center (DRC) (2)
Focuses on R&D in the domain of operating systems
Microkernels, hypervisors
Collaboration with the OS Kernel Lab in Huawei HQ
Collaboration with TU Dresden, MPI-SWS, ETH Zürich and other institutions
Formal verification of correctness, weak memory architectures
Safety and security certification
Many-core scalability, heterogeneous hardware
Flexible OS architecture
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 5
We Are Hiring
Operating System Engineer / Researcher (Dresden)
https://apply.workable.com/huawei-16/j/3BAC3458E6/
Formal Verification Engineer / Researcher (Dresden)
https://apply.workable.com/huawei-16/j/95CCAD4EC5/
Virtualization Engineer / Researcher (Munich)
https://apply.workable.com/huawei-16/j/51F90678EA/
Industrial Ph.D. Student (Dresden)
In collaboration with TU Dresden
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 6
Systems Software Innovations Summit 2021
March 30th – 31st 2021
https://huawei-events.de/, on-line, no participation fee
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 7
Microkernels
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 8
Microkernel-based Operating Systems
Motivation
Safety, security, reliability, dependability
Proper software architecture
Formal verification of correctness
Modularity, customization
Virtualization, paravirtualization
Tasks and virtual machines are quite similar types of entities
Partitioning, support for mixed criticality
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 9
Monolithic OS Design Is Flawed
Biggs S., Lee D., Heiser G.: The Jury Is In: Monolithic OS Design Is
Flawed: Microkernel-based Designs Improve Security, ACM 9th Asia-
Pacific Workshop on Systems (APSys), 2018
“While intuitive, the benefits of the small TCB have not been quantified to
date. We address this by a study of critical Linux CVEs, where we examine
whether they would be prevented or mitigated by a microkernel-based
design. We find that almost all exploits are at least mitigated to less than
critical severity, and 40 % completely eliminated by an OS design based
on a verified microkernel, such as seL4.”
https://dl.acm.org/doi/10.1145/3265723.3265733
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 10
Some Data Points from History
Compatible Time-Sharing System (CTSS)
John McCarthy, MIT Computation Center, 1961
Probably one of the earliest “real” operating system
Not just a loader, jobs manager or batch manager
RC 4000 Multiprogramming System
Per Brinch Hansen, Regnecentralen, 1969
Separation of mechanism and policy, modularity via isolated concurrently running
processes, message passing
Multics
MIT, General Electric, Bell Labs, 1969
Traceable influence on UNIX
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 11
Some Data Points from History (2)
HYDRA
William Wulf, Carnegie Mellon University, 1971
Capability-based, object-oriented, separation of mechanism and policy
Probably the earliest peer-reviewed publication of the design principles
UNIX
Ken Thompson, Dennis Ritchie, Brian Kernighan et al., Bell Labs, 1973
Architecture and design traceable in many current monolithic systems
VMS
Digital Equipment, 1977
Architecture and design traceable in Microsoft Windows
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 12
Some Data Points from History (3)
EUMEL / L2
Jochen Liedtke, University of Bielefeld, 1979
Proto-microkernel based on bitcode virtual machines
QNX
Gordon Bell, Dan Dodge, 1982
Earliest commercially successful microkernel multiserver OS
Still in active use and development today
CMU Mach
Richard Rashid, Avie Tevanian, Carnegie Mellon University, 1985
Arguably the most widespread microkernel code base
Still a core part of macOS, iOS and other OS clones by Apple today (but not in a microkernel configuration)
Despite its well-publicized shortcomings, it remains highly influential
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 13
Microkernel-based Operating Systems
Definition
Operating system that follows specific design principles that, in effect,
minimize the amount of code running in the privileged (kernel) mode
Hence the name
Every microkernel-based OS follows slightly different specific design
principles
Two design principles are probably universally common
Minimality principle
Split of mechanism and policy principle
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 14
Minimality Principle
The obvious criterion
The kernel needs to implement the functionality than cannot be possibly
implemented in user space
On typical commodity hardware, this includes
Bootstrapping
Fundamental part of hardware exception and interrupt handling
Configuration of certain control registers (possibly including MMU)
Fundamental part of mode switching (e.g. related to hardware virtualization,
trusted execution environments, etc.)
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 15
Minimality Principle (2)
The necessary criterion
The kernel needs to implement the functionality than cannot be
delegated only to a trusted user space component without also
delegating it to any untrusted user space component (thus undermining
the fundamental guarantees that the operating system provides)
On typical commodity hardware, this includes
Configuration of the forced preemption mechanism (e.g. timer interrupt
routing)
Fundamental part of interacting with a hypervisor, firmware and some hardware
components
– Hardware components are tricky: Without IOMMU, almost any interaction
with hardware might potentially undermine the OS guarantees
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 16
Minimality Principle (3)
The practicality criterion
The kernel might also implement the functionality that would be
unpractical (while still technically possible) to be safely delegated to user
space
This is where microkernels differ, but there are still some universal examples
Context switching
Basic scheduling
System timer configuration
Observability and (optional) debugging support
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 17
Split of Mechanism and Policy Principle
Orthogonal to the minimality principle
The microkernel is not an indivisible entity
Composed of instructions, basic blocks, language constructs, etc.
The code inevitably follows some patterns that form architecture, design,
abstractions, parametrization, etc.
Separation of concerns
The kernel implements only pure and universal mechanisms (“the what”)
while the policies (“the how/when”) are delegated to user space
This is where microkernels differ
– Does “arbitrary policy” equal “no policy”?
– Is it fine to have a default (but replaceable) policy?
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 18
Practical Differences
Monolithic kernel
Configurability via compile-time options and parametrization
Modularity via run-time dynamic linking
Tight module coupling, weak module cohesion
Structure is implicit and not enforced (especially at run time)
Microkernel
Configurability via different use (policy in user space)
Modularity via extension in user space
Loose module coupling, strong module cohesion
Structure is explicit and enforced (even at run time)
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 19
Design Space of Operating Systems
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 20
Design Space of Operating Systems
monolithic fine-grained
components components
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 21
Design Space of Operating Systems
safety via
isolation
monolithic fine-grained
components components
raw
performance
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 22
Design Space of Operating Systems
static
deployment
safety via
isolation
monolithic fine-grained
components components
raw
performance
dynamic
deployment
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 23
Design Space of Operating Systems
static
deployment
unikernel
OS separation
kernel
safety via
isolation
monolithic fine-grained
components components
raw
microkernel microkernel
performance single server OS multiserver OS
hypervisor
monolithic dynamic
kernel OS
deployment
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 24
Architecture of User Space
Monolithic OS
application application application
unprivileged mode
privileged mode
monolithic kernel
hardware
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 25
Architecture of User Space
Single-server microkernel OS
application application application
system server
device file system user network
...
drivers drivers mgmt stack
unprivileged mode
privileged mode
memory
mgmt scheduler IPC microkernel
hardware
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 26
Architecture of User Space
Multiserver microkernel OS
application application application
privileged mode
memory
mgmt scheduler IPC microkernel
hardware
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 27
Architecture of User Space
Type-1 hypervisor
operating system operating system operating system
app app app app app app
hyper-privileged
memory mode
mgmt scheduler comm hypervisor
hardware
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 28
Architecture of User Space
Type-1 hypervisor (in common deployment)
operating system operating system operating system
hyper-privileged
memory mode
mgmt scheduler comm hypervisor
hardware
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 29
Architecture of User Space
Hypervisor with unikernels
unikernel unikernel unikernel
hyper-privileged
memory mode
mgmt scheduler comm hypervisor
hardware
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 30
Architecture of User Space
Multiserver microkernel with unikernels for device drivers
unikernel
application application application
privileged mode
memory
mgmt scheduler IPC microkernel
hardware
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 31
Architecture of User Space
Multikernel
privileged mode
kernel kernel kernel
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 32
Capabilities
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 33
Capabilities
Motivation
A universal and pure mechanism in the kernel to safely manage (all)
operating system resources
Without implementing any specific management policy in the kernel
(i.e. delegating the management policy completely to user space)
Potential secondary goal
Possibility to grant or delegate (parts of) the authority over resources from
the original owner of a resource to other users
In a controllable fashion (i.e. including the possibility of revocation)
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 34
Capabilities
Definition
Capability
Object (instance of a given object type) identifying some specific (operating system) resource
Kernel object identifying a kernel-managed resource
Kernel object (proxy object) identifying a user space resource
User space object identifying a user space resource
Capability reference
Unforgeable identifier (handle) to a capability
Might be associated with permissions (e.g. permissible operations, methods) and ownership
Capability space
Each capability reference is local to a specific namespace (typically associated with a specific
task, process) and does not have any meaning in other namespaces
Akin to (virtual) address space
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 35
What Are Capabilities, Anyway?
read(0, ...); file descriptor
(capability reference)
user space
kernel space
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 36
Capability Granting
struct msghdr msg;
struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
// ...
memmove(CMSG_DATA(cmsg), &fd, sizeof(fd));
sendmsg(socket, &msg, 0);
user space
kernel space
0 1 2 3 0 1 2 3
vfs_file_t
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 37
Capability Granting
struct msghdr msg;
struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
// ...
memmove(CMSG_DATA(cmsg), &fd, sizeof(fd));
sendmsg(socket, &msg, 0); struct msghdr msg;
struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
// ...
recvmsg(socket, &msg, 0);
int fd;
memmove(&fd, CMSG_DATA(cmsg), sizeof(fd));
user space
kernel space
0 1 2 3 0 1 2 3 4
vfs_file_t
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 38
Chicken & Egg Problem
What if we want to represent all resources as capabilities?
Even the resource (memory) needed to store the capabilities and
capability references is a capability
We start with some basic capability (untyped capability) that represents
(physical) memory
Encapsulated capability vs. naked capability
This capability can be retyped to a different capability or converted to multiple
capabilities
– Allocating kernel objects
– Allocating capability nodes that bind capability references to capabilities
Bookkeeping objects (e.g. memory for page tables) might also be represented as
capabilities
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 39
Capability Derivation Tree
Permissible ways of retyping capabilities
untyped 10 pages
cap
untyped untyped
cap 1 page
cap 1 page L1 PT L2 PT
cap cap
cnode TCB
cap cap
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 40
Representing Capability Space
Effective and efficient storage for capability nodes
Criteria
Low memory overhead and fragmentation even for sparse capability spaces
Fast lookup of capability references (typically the most frequent operation)
Reasonably fast creation and removal of new capability references
Possibility to store metadata (e.g. permissions, ownership/delegations) and even
actual kernel objects (up to a certain size) in-line
Typical candidates
Arrays
Hash tables
Radix trees
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 41
Hierarchical Capability Space
00 01 11 cref_t
user space
kernel space
cnode
cap cnode_t (10 bit index)
cnode untyped
cap cap cnode_t (10 bit index)
mem_region_t resource
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 42
Capability Operations
Actions that can be performed with capabilities
The permissible set of operations might be defined/restricted by the capability
reference itself
Each capability reference might permit different methods despite pointing to the
same object
Invoke
Executing some “business logic” operation on the target object
Clone
Creating a duplicate capability reference
Mint
Creating a duplicate capability reference, but with restricted permissions
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 43
Capability Operations (2)
Derive
Retyping the capability to a different capability type or converting it to multiple
capabilities
Permissible retyping/conversions defined by the capability derivation tree
Delegate
Passing the ownership of the capability reference to different capability space
Grant
Creating a duplicate capability reference (possibly with restricted permissions) in a
different capability space (while keeping ownership)
Might be done only once or recursively
Revoke
Removing a granted capability reference from a different capability space
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 44
Get to Know
Microkernels
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 45
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 46
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 47
HelenOS Microkernel Functional Blocks
kernel kernel
debug unit
console tests
ELF kernel
lifecycle kernel
loader mgmt log
architecture independent
cycle & lists, generic synchro-
system time tracing nization
information trees, resource
mgmt support
bitmaps allocator interface
shared architecture
platform interrupt global page hierarchical
platform I/O debugging
architecture
dependent
routines handling drivers mgmt support support support
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 48
HelenOS User Space Architecture
remote remote
console framebuffer
console compositor
vterm bdsh
clipboard audio
client
session input output
human interface
TMPFS Location FS
vfs device
manager inetsrv
naming task
service loader monitor init
kernel
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 49
HelenOS User Space Device Drivers
mbr guid
ahci ddisk
USB mass storage ata RAM disk
UHCI root hub USB hub block device
drivers
USB MID USB HID
sb16 hdaudio
class drivers
audio drivers
ne2000
s3c24xx
rtl8139 rtl8169
isdv4 xtkbd
ar9271 e1000 uhci xhci
ps2 i8042
network interface pci ohci ehci
drivers pl050 ns8250
isa amba
character
bus drivers device drivers
cmos-rtc
clock drivers
architecture virtual
root drivers
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 50
Genode OS Framework
[1] Feske N.: Introducing kernel-agnostic Genode executables, Genode Labs, FOSDEM 2017,
https://fosdem.org/2017/schedule/event/microkernel_kernel_agnostic_genode_executables/
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 51
Further Reading
Du D., Hua Z., Xia Y., Zang B., Chen H.: XPC: Architectural Support for
Secure and Efficient Cross Process Call, ACM/IEEE 46th Annual
International Symposium on Computer Architecture (ISCA), 2019
https://ieeexplore.ieee.org/abstract/document/8980352
Matthias Lange: The impact of Meltre and Specdown on microkernel
systems (*), Microkernel Devroom, FOSDEM, 2019
(*) Deliberate misspelling of Meltdown and Spectre
https://archive.fosdem.org/2019/schedule/event/meltre_specdown/
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 52
Q&A
Martin Děcký, March 25th 2021 Microkernel-based and Capability-based Operating Systems 53
Thank You!