0% found this document useful (0 votes)
53 views

Developing With Formal Methods at BedRock Systems Inc.

The document discusses Bedrock Systems' work to develop a formally verified hypervisor called the Bedrock Hypervisor (BHV) using formal methods at scale. The BHV uses lock-free concurrency and is proved correct using the Coq proof assistant. Bedrock aims to integrate formal methods into the standard development process and treat it as rigorous software engineering. They are pioneering the use of formal methods for system-level software like hypervisors which requires developing program logics and automation to connect high-level reasoning to low-level code proofs.

Uploaded by

jtpaasch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Developing With Formal Methods at BedRock Systems Inc.

The document discusses Bedrock Systems' work to develop a formally verified hypervisor called the Bedrock Hypervisor (BHV) using formal methods at scale. The BHV uses lock-free concurrency and is proved correct using the Coq proof assistant. Bedrock aims to integrate formal methods into the standard development process and treat it as rigorous software engineering. They are pioneering the use of formal methods for system-level software like hypervisors which requires developing program logics and automation to connect high-level reasoning to low-level code proofs.

Uploaded by

jtpaasch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

FORMAL METHODS AT SCALE

Developing With Formal Methods


at BedRock Systems, Inc.

Gregory Malecha, Gordon Stewart, František Farka, Jasper Haag, and Yoichi Hirai | BedRock Systems, Inc.

The BedRock HyperVisor (trademarked) is a commercial, highly concurrent, verified virtualization


platform that employs formal methods to enable proofs of complex, lock-free concurrent code; support
automating proofs of large programs; and integrate with “informal” parts of the software lifecycle.

B uilding on academic research but with feet firmly


planted in industrial applications, BedRock Systems
is in the process of building the BedRock HyperVisor
engineering best practices. Further, our experiences sug-
gest that FM techniques are increasingly able to directly
address (and in some cases, improve upon) current best
(BHV, trademarked), a formally verified, highly concur- practices in software engineering.
rent, microkernel-based commercial hypervisor. By for- Despite the alignment of aims, the road is not always
mally verified, we mean that the C++ and assembly-code an easy one. Pioneering FM at scale means that we must
implementation of the operating system is proved cor- build many tools and libraries ourselves. Although FM
rect in the Coq proof assistant.1 By highly concurrent, tools still lag behind more mainstream tools, we believe
we mean that we use, and verify, lock-free data structures they have matured to the point of being usable in an
in core parts of the implementation. That the BHV is industrial context. Further, development and adoption
microkernel based means that most of the system runs of these tools is growing, and we anticipate the situation
in user mode on top of a small, kernel-level program, will continue to improve.
which provides bare-bones abstractions such as address Beyond the tools themselves, writing verified soft-
spaces and interprocess communication (IPC). ware, even in mainstream languages, still suffers from a
To verify the BHV, BedRock Systems is pioneering dearth of libraries. One of the main benefits of mature
the use of formal methods (FM) at scale. This experience ecosystems such as C++ is the availability of libraries,
report explains how we use FM and why. Our experience but very few of these libraries are formally specified,
shows that advances in FM techniques finally enable them let alone verified. Our own work has already begun to
to integrate well in the standard software development address this problem internally, and we believe that it
process. In essence, FM-based software development is just a matter of time before developers are able to use
is “just” mathematically rigorous software engineering. libraries that are formally verified.
Additionally, the design aims of FM align with software
Outline
Digital Object Identifier 10.1109/MSEC.2022.3158196
We report on BedRock System’s work to build a veri-
Date of current version: 20 April 2022 fied, performant hypervisor. Achieving this goal requires

Authorized licensed use limited to: Georgetown University. Downloaded on May 16,2023 at 17:49:10 UTC from IEEE Xplore. Restrictions apply.
1540-7993/22©2022IEEE Copublished by the IEEE Computer and Reliability Societies May/June 2022 33
FORMAL METHODS AT SCALE

system-, as opposed to program-level, FM. These require- but can be a difficult shift from a more theoretical
ments inform our FM process, which starts in the academic mindset.
design phase. Here, the ability to consider problems
abstractly enables us to rapidly explore the design The BHV
space. This process often produces replicable patterns The aim of FM at BedRock Systems is to develop a flex-
that can be precisely documented, informing future ible and ultrasecure compute platform. For flexibility,
design and development. the BHV follows a microkernel architecture (see Fig-
We connect this design-level formalization to ure 1). Independent applications provide decentral-
the running code through the rest of the software ized services for features such as virtual compute and
engineering process. Making this connection for- networking. This modular architecture enables us to
mal ultimately delivers the correctness guarantees customize the BHV’s feature set by selecting differ-
that FM advertises, e.g., eliminating bugs, but it ent applications in different contexts. However, it also
also brings two challenges: the development of a requires us to establish the system’s correctness in a
program logic for a mainstream programming lan- similarly modular fashion.
guage, C++, and extensible automation to translate
high-level reasoning principles into robust proofs Virtual Compute
about low-level code. The BHV’s core use case is as a virtualization platform.
Scaling FM requires solving nontechnical prob- This functionality is provided by the BedRock Virtual
lems as well. We explain how we manage our FM Machine Monitor (VMM), which virtualizes a single
development process for both predictability and unmodified computer. Although the code is complex
reliability. Our approach is agile based and centers and tied to the BHV’s application programming inter-
on daily standups and frequent code reviews. We face (API), the top-level specification is simple: the
focus our efforts on concentrated verticals, which BedRock VMM is the correct implementation of a
we call spikes, to ensure that specifications are both bare-metal computer. We call this property the BedRock
usable by clients and provable for implementers. bare-metal property (trademarked).
Beyond delivering better code, focused group work Formally, the bare-metal property states that the
also improves onboarding and on-the-job training, BedRock VMM is a timing-insensitive refinement of
which are crucial when working on the bleeding the hardware specification. (Note that the FM work on
edge. Our experiences suggest that driving other- BHV currently focuses on the ARM architecture, but
wise open-ended research from concrete “in the other architectures will be supported in the future.)
code” problems is crucial to predictable execution, Informally, this means that anything that happens when
running a guest on the BedRock VMM could happen
on a compliant system. Guests act as if they are run-
BedRock Bare Metal Property: ning on isolated bare metal and achieve a similar level of
Behavior on VMM Refines
Behavior on Hardware Up to
security guarantee. This enables consolidation without
Untrusted
Guest Active Security the added risk of lateral (cross-VM) attacks on the vir-
tualization platform.
Active Untrusted The bare-metal property is enhanced by BedRock
Security vSwitch Customer
VMM UMX Drivers Active Security (registered in the U.S. Patent and Trade-
Service
mark Office), which acts as a runtime monitor for vir-
Master Controller (Root Task) tual machines. In terms of a bare-metal guarantee, an
Userland
active-security-enabled guest runs on a standard proces-
NOVA Microhypervisor sor with (potentially guest-specific) security extensions,
e.g., register protection or write-execute mutual exclu-
Hardware sion. With this specification, security-compliant guests
cannot distinguish an active-security-enabled VMM
BHV
from hardware. Guests that violate the policy, however,
Figure 1. The BHV. Top-level applications such as the Virtual Machine Monitor see a machine that includes the runtime security monitor.
(VMM) and virtual switch (vSwitch) sit atop the master controller. The
master controller provides support for userland services, which are used by Virtual Networking/Communication
independent programs to provide application-specific services. The NOVA Leveraging the modularity of the BHV architecture, we
microhypervisor provides minimal primitives that enable these features. The
extend the single-computer correctness condition to
Active Security module enforces security policies on guests at runtime. UMX is
multiple computers using a virtual network switch. The
the system console multiplexer.
virtual switch (vSwitch) enables guests to communicate

Authorized licensed use limited to: Georgetown University. Downloaded on May 16,2023 at 17:49:10 UTC from IEEE Xplore. Restrictions apply.
34 IEEE Security & Privacy May/June 2022
with one another using the virtual I/O device net- challenging problem, but is needed to enable the transi-
work protocol.2 As in the VMM, the software imple- tion to a verified software stack.
mentation enables expressive and dynamic policies Language-agnostic, system-level software verifica-
to be enforced in the vSwitch. The architecture of the tion demands a unifying formalism, one that can uni-
vSwitch occurs repeatedly throughout the BHV, e.g., formly talk about the values of program variables, state
in the console multiplexer (UMX) and other services. of hardware devices, application-level protocols, and
The commonality enables abstraction and reuse within more. Further, verification techniques must support
our implementation. open-world reasoning, which means that the proofs of
individual threads can be combined with the proofs
An Extensible, Distrusting Platform of other threads (and arbitrary contexts) to establish a
The BHV supports running unverified applications whole-system guarantee.
side by side with verified ones, without compromis- Separation logic3 satisfies these exacting require-
ing the integrity of the verified applications. Without ments. Modern separation logic, as embodied by the
this requirement, we could verify a weaker specifi- Iris library,4 is built around first-class resources that can
cation that requires a disciplined use of the API. be owned by executing entities, e.g., threads, and invari-
However, these weaker specifications are insufficient ants, which provide a mechanism for atomically sharing
when adversarial code might be running on the plat- these resources between threads. At the heart of separa-
form. To address this issue, the BHV’s top-level tion logic is the separating conjunction (written P * Q),
specification uses an operational semantics based on which implicitly expresses disjointness of the resources
process calculi in which untrusted processes, such as in P and Q. Disjointness is the right default for build-
guests, are modeled by their machine-level behavior. ing compositional systems, and the succinct manner
Although verbose in some cases, operational seman- of expressing it in separation logic leads to elegant and
tics enables us to model untrusted code as simply modular specifications. Although using resources to
“what the bits say.” reason about memory is standard practice, the gener-
Supporting unverified applications is crucial in practice ality of resources means that we can use them to track
because it enables a path to verified systems rather than other kinds of system state, such as kernel objects and
mandating an all-or-nothing mentality. This enables the state of hardware devices. Because separation logic
both “preview” releases, which may contain unverified treats resources uniformly, it is also possible to define
functionality, and the ability to support customer appli- and compose abstractions that encapsulate resources of
cations, which have not been verified. distinct types, e.g., those that bundle program variables
and kernel resources.
FM at Bedrock
Achieving formal guarantees at the level of the BHV BRiCk—A Program Logic for C++
places demanding requirements on our FM techniques. To verify C++ programs, we need formal reasoning
In contrast to many code-based verifications, which focus principles for the language fragment that the programs
on verifying a single application, establishing the cor- use. These rules are codified in BRiCk,5 the BedRock
rectness of the BHV requires reasoning across multiple C++ program logic. BRiCk builds off of the Clang/
programs coordinating through low-level mechanisms, LLVM6 compiler front end and uses its source-level
such as shared memory. The need to interoperate with abstract syntax to represent C++ programs.
unverified applications limits the assumptions that we BRiCk axiomatizes reasoning principles for each
can make across these boundaries, which sometimes type of node. We justify these reasoning principles
requires us to return to first principles when devel- informally by appealing to the C++ standard7 and aca-
oping a verification strategy. Consider, for example, demic work-formalizing aspects of both C8 and C++.9
shared memory queues. When both parties are trusted, The choice to formalize C++ axiomatically, rather than
a server can assume that clients follow the protocol; operationally, is primarily a pragmatic one: it enables
however, when the client could be malicious, the server us to easily underspecify language constructs, grow the
is still obligated to behave correctly to ensure that supported feature set over time and, we believe, accu-
the bad actor cannot compromise the guarantees of rately model the standard.
well-behaved clients.
These constraints require our techniques to apply Automation for BRiCk
at the machine-code level while still enabling the use of Automation is crucial to scaling program verification
language-specific reasoning principles in contexts that to large and evolving code bases. BedRock System’s
we prove are well behaved. The presence of unverified automation for BRiCk is built around the mental
components adds additional complexity to an already model of symbolic debugging, where the current state

Authorized licensed use limited to: Georgetown University. Downloaded on May 16,2023 at 17:49:10 UTC from IEEE Xplore. Restrictions apply.
www.computer.org/security 35
FORMAL METHODS AT SCALE

is expressed formally in separation logic and the core are inspired by abstractions that work well in functional
automation interprets program fragments against this programming, e.g., higher-order functions such as folds.
state. To be understandable, the automation must pre- The expressivity of higher-order separation logic allows
serve program-specific abstractions as much as possible. us to use these abstractions in low-level C++ code, e.g.,
Reaching into a complex invariant to justify a read may modeling a C++ class as an effectful function, rather
enable the verification to make progress, but the result- than its first-order representation to insulate developers
ing state is often incomprehensible to clients who wish from later extensions.
to remain insulated from the definition of the invari-
ant. To achieve this, library developers write, and prove, Locking Protocols in the VMM
“hints” encapsulating common reasoning patterns that Hardware virtualization is a complex task with a demand-
are not immediately obvious from the code. These hints ing specification. In the VMM, the problem is exac-
cover coding patterns sanctioned by the library writer erbated by the fact that much of the “code” that we
and are applied automatically when clients follow these are working with is not under our control. Specifically,
guidelines. Deviating from these coding patterns leads our verification cannot make any assumptions about
the automation to get stuck, but in an understandable the guest code, its memory-access patterns, or its use
state that facilitates debugging. of synchronization.
BedRock System’s automation also provides more When virtualizing a multicore machine, proper syn-
manual tactics for reasoning about language constructs chronization is crucial. Even though our implementa-
that deserve special attention. For example, loop invari- tion seeks to be as efficient as possible, it is sometimes
ants are notoriously difficult to find automatically, so necessary to pause the virtual CPUs to give the virtual-
we provide tactics to specify loop invariants manually. ization layer a snapshot of the system that it can inspect
Our collection of tactics also includes some more spe- in a stable way. We refer to this pausing as round up; one
cialized tactics for reasoning about common coding pat- module, typically Brass, requests that the virtual CPUs
terns such as cas-loops and foreach-loops. Beyond of a guest synchronize to prevent memory access during
their general usefulness, these tactics also document a certain period of time.
reusable reasoning patterns. We have also experimented Specifying round up informally (even with a work-
with tactics that apply more aggressive heuristics. How- ing implementation) turned out to be difficult for tech-
ever, we tend to prefer slightly more verbose (but main- nical reasons: first, virtual CPUs can run either on the
tainable) proof scripts as these more aggressive tactics hardware or in our software instruction emulator; sec-
can sometimes fail in unpredictable ways. ond, both virtual CPUs and virtual devices can access
the memory.
Design-Time FM Ultimately, separation logic provided a precise and
Although it is tempting to think of verification as extensional explanation that was understandable by
another step added to the end of the software devel- both systems and FM engineers. The central idea was to
opment process, we have found that this approach model a memory “lock” token, mediating access to the
misses out on much of the value that FM can provide. guest’s memory. The full ownership of this token guar-
Undoubtedly there is value in the final proof, but FM antees exclusive access to the guest’s memory regions.
insights can dramatically improve code quality even A partial ownership allows shared access which, while
before we verify the code. Beyond improved code still safe, does not provide atomicity guarantees. When
quality, design-level FM often produce easier-to-verify leveraging hardware virtualization, the program has
code that is more future proof. This is especially impor- no fine-grained control over the guest’s actions, so the
tant because it avoids expensive refactorings of both hardware thread owns a portion of the lock token (recall
the code and the proof. Although FM reviews are an that the guest manages its own synchronization). The
important input to the final verified product, systems semantic condition of taking a step on behalf of the
engineers generally find that such reviews help to clar- guest shows that the memory token is also needed by
ify high-level design thinking. the instruction emulator. When a guest virtual CPU
We consider two instances, among many, where is not running on the hardware or being emulated in
design-time interaction with systems engineers clari- software, it can relinquish its portion of the token, thus
fied an existing design and influenced a refactoring. In ceding its ability to access memory. The notion of del-
the design phase, many of the insights of FM could have egatable ownership is a central tenet of separation logic
come equally from architectural and design reviews. The and is a useful strategy when modeling systems such as
added value of FM lies in the ability to carry the insights this one. By expressing the ability to do something as a
forward, from high-level discussion down to code-level first-class resource, we avoid the need to think about all
specification and proof artifacts. Many insights of FM the entities that could do it.

Authorized licensed use limited to: Georgetown University. Downloaded on May 16,2023 at 17:49:10 UTC from IEEE Xplore. Restrictions apply.
36 IEEE Security & Privacy May/June 2022
Architecting Services Developing Proofs
IPC is a staple of interprocess coordination in micro­kernel- After the specification is developed and the code is
based architectures. NOVA10 provides a fast, intracore written (not always in that order), BedRock Systems’
IPC mechanism, and the BHV builds higher-level con- process prescribes a detailed code review with an FM
cepts of services and sessions to enable sharing kernel engineer. Together, the two (or more) develop an infor-
resources such as memory and semaphores. The stateful, mal proof that the code satisfies the specification. In
reactive nature of services is a fundamentally different most cases, this involves a sketch of the “class invariant”
development paradigm than the “direct” programming and a Hoare outline for the nontrivial functions pro-
model; care must be taken when developing applications vided by the class. These outlines directly connect the
like these, especially when clients should not always be specification and the code—there is no additional layer
trusted. Code reviews revealed that many early services of modeling.
suffered from the same mishandling of subtle situations This review is often an iterative process, and it is
such as session lifetime. Further, the ad hoc develop- essential that we can get through a cycle rapidly. A
ment of many services meant that we lost opportunities common approach is to formulate a class invariant and
to share code. then expand and refine it as we incorporate new bits of
To support the reactive programming model, we functionality in successively greater detail. In practice,
developed a code- and specification-level template for much of this iteration is carried out over a less formal
writing and specifying verifiable services. The abstrac- medium, e.g., a (virtual) whiteboard. Separation logic
tion encapsulates session management by decoupling resources are often nicely conceptualized graphically,
connect and disconnect and requiring developers to and we find that sketching boxes and moving them
think about disconnect logic under arbitrary states of the around can be very helpful to explore abstractions and
protocol. Code that fits into the template is highly reg- implementations.
ular and has a much higher chance of being specifiable Once we have covered the core functionality of
than more ad hoc code. Further, the library completely the class, we formalize the definitions. Generally, this
encapsulates the subtle logic around session lifetime that process is rather straightforward, but it relies on good
plagued earlier code. Specifying this abstraction turned working knowledge of the verification tool (Coq, in our
out to be quite difficult due to the fact that certain oper- case). At this level, we are choosing specific data repre-
ations such as connect and disconnect are silent in the sentations, such as whether to use a list or a finite map,
BHV. Our approach uses purely logical “callbacks” that how to express the relationship between an array in
enable servers to logically set up services before the C++ and its length, and so on. Many of these problems
application is notified. These callbacks can transition the can be solved prescriptively, e.g., “always use arrayR to
service’s specification state, and the code can “catch up” represent an array”; however, we do not claim to have
when it learns of the new connection in the first message. all of right answers yet. But as we verify more code, we
This approach enables us to provide an abstraction that refine patterns and create new ones. Beyond providing a
is much easier to reason about, and to hide the imple- codified best practice, patterns such as these also enable
mentation and proof details from users. us to narrow the focus of the automation’s development.
The specification here relies on sophisticated fea- When code does not fit within our existing abstrac-
tures of concurrent separation logic but, in the end, tions, we look to expand them, develop new ones, or
the abstraction is fairly intuitive and it enables a simple rework the code to fit within them. Although rewriting
programming model. Situations such as these, which code may seem to indicate that our techniques are not
have occurred several times to date across our codebase, up to the challenge, we note that developers often pre-
highlight the value of separation logic as a means to sim- fer simplified code, and on many occasions, very subtle
plify code and proofs. bugs have been found around these points. Mathemati-
cally, separation logic can scale to arbitrarily complex
Verification Engineering code, but keeping reasoning simple is often the better
Although it is common for design reviews to uncover path in the long term. When proposing code changes,
misunderstandings and bugs, being certain the code is we always consider runtime costs, readability con-
bug free requires verifying that it satisfies its specifica- siderations, and limitations (or enablements) of the
tion. At BedRock Systems, we have successfully verified new code.
production code at all levels of our virtualization stack.
Beyond pure C++ libraries, our verification also covers Bugs
user-space device drivers (for a serial driver and a direct Throughout verification we found and fixed a number
memory access driver), a concurrent terminal muti- of bugs across all parts of our stack. Although many
plexer, and portions of other applications and NOVA. bugs are found during testing, we have determined that

Authorized licensed use limited to: Georgetown University. Downloaded on May 16,2023 at 17:49:10 UTC from IEEE Xplore. Restrictions apply.
www.computer.org/security 37
FORMAL METHODS AT SCALE

concurrency bugs, resource leaks, and error-handling maintenance. Robust proof automation and appropri-
logic are especially difficult to test and are therefore often ate abstractions mitigate the burden to some degree,
caught by code reviews or during proofs. For example, but do not scale to all interface- and specification-level
in developing our shared-pointer library, we ran a sig- refactorings. In these situations, the cost is unavoidable:
nificant number of randomized tests without uncov- significant algorithmic changes will require fundamen-
ering several bugs caught during the FM code review. tally different proofs.
Another instance of a subtle logic bug was a synchroni- Beyond guaranteeing bug freedom, one benefit of
zation issue within NOVA, which could cause incorrect proofs over testing is that they are hyperlocal. Proof
continuation to be used when switching threads (execu- failures tell you exactly the point where the code may
tion contexts in NOVA). In these instances, and many be broken as well as giving you the symbolic state
others involving concurrency, we found that state-based that is problematic. This information is especially use-
reasoning, which focuses on what is true in a particular ful in tracking down concurrency “Heisenbugs” that
state, is more useful than trace-based reasoning, which occur infrequently and are therefore difficult to test
describes the operations that took place to arrive at a and reproduce.
given state, for zeroing in on problems. When working on improving automation, checking
Although the previous two bugs were found during proofs in CI is crucial as proofs of code double as test
the FM review phase, in other instances, our reviews cases for automation. Timing statistics from CI runs
missed subtle bugs that were ultimately uncovered as provide useful information around automation perfor-
we formalized the proof. In the UMX, we uncovered a mance. Line-count statistics of new proofs are a good
synchronization issue that would occur if a client dis- first-order signal of the effectiveness of the automation
connected at precisely the right time during data for- because verbose proofs often point to shortcomings in
warding. Ultimately, this bug could cause data loss, automation, although it is important to factor in com-
but reliably triggering it in a testing scenario would be plexity of the underlying code as well.
extremely difficult.
Beyond logical bugs, FM code reviews and proofs Extensible Automation
uncovered portability and standards compliance issues The need to understand and maintain proofs requires
within our code. Portability bugs often arise in code that we express them at a high level. Making this possible
that implements low-level data marshaling and might for larger C++ programs that evolve over time requires
rely, e.g., on the endianness of the system or the abil- that we keep the proofs small by automating the admin-
ity to perform unaligned reads and writes. Although istrative reasoning necessary to complete a proof. To
strict adherence to the C++ standard may seem overly facilitate high-level reasoning, our automation is post-
pedantic, we believe that it is the only viable path for- facto extensible using stylized reasoning principles that
ward in the long term. Optimizing compilers crucially we call hints. Hints are justified once and applied auto-
rely on undefined behavior to enable optimizations, matically by automation whenever the situation merits
and noncompliant code can result in bugs at higher it. These hints enable us to reason at a natural level of
optimization levels that are difficult to track down abstraction while also insulating clients from some of the
because they do not exist in debug builds. The C++ more technical details of specifications and code.
standard is the contract between developers and com- We contrast this semiautomatic reasoning with more
piler writers; if developers need something that the manual reasoning traditionally provided by interactive
standard does not provide, the standard needs to be theorem provers and the Iris Proof Mode (IPM).11 The
expanded to provide it. IPM provides fine-grained context management and
low-level primitive tactics for reasoning about separa-
Proof Maintenance. Keeping proofs in sync with code is tion logic formulas. This sort of reasoning is ideal for
essential to maintaining high quality through refactor- subtle proofs that require tricky resource management;
ings. At BedRock Systems, our continuous integration our metatheory leverages this verification approach
(CI) checks that all proofs succeed before any merge to extensively. When verifying C++ programs, however,
the main branch. Overall, we have not found this to be we find that large parts of the proof are “follow-your-
particularly burdensome as well-designed verified code nose” proofs. Indeed, in many instances, the program is
tends to be fairly stable. When code changes are small effectively the proof, and the proof is merely bookkeep-
(and correct), our automation is often able to discharge ing. In these circumstances, it is ideal to teach the auto-
new obligations automatically, and no changes to the mation to follow its own nose so that the verification
proof scripts are necessary. Inevitably though, more engineer can focus on subtle aspects of the verification.
complex changes, especially those that affect class invari- We offer two instances where custom, but reusable,
ants and concurrency protocols, require manual proof hints accelerated proof development. The first arose

Authorized licensed use limited to: Georgetown University. Downloaded on May 16,2023 at 17:49:10 UTC from IEEE Xplore. Restrictions apply.
38 IEEE Security & Privacy May/June 2022
when verifying higher-level specifications on top of arise on a per-language rather than a per-program basis,
lower-level ones for the microkernel. When verifying so most FM practitioners need not worry about these
the microkernel, we need to provide specifications that issues. Further, BRiCk already addresses these issues for
are maximally distrusting of applications running atop the fragment of C++ that it supports.
it. We achieve this by providing low-level, “undisci-
plined” specifications that can support arbitrary, espe- Supporting Large Languages. The size of modern lan-
cially concurrent, usage. In practice, however, these guages means that we must formalize them incremen-
specifications can be both difficult to read and pro- tally. To facilitate this, BRiCk’s semantics is directly
gram against. On top of these specifications, we can expressed as a program logic, rather than as a derived
prove simpler specifications that are able to hide details logic on top of an underlying operational semantics.
when using the API in a more restricted setting. As a This approach allows us to leverage the built-in modu-
simple example, if we know that a capability must refer larity of separation logic to modularize our semantics.
to a semaphore object, then we do not need to consider It also makes it natural to underspecify the semantics of
error codes from the microkernel that correspond to particular language constructs, a property that is essen-
capability mismatches. Proving the well-behaved spec- tial early on and still useful when working with multiple
ifications from the unsafe ones can be onerous, but is related languages, e.g., C++14 and C++17.
generally not complicated; however, the tedium of this Although many language features can be desugared
task can be alleviated by a small number of generic to simpler primitives, we avoid this when possible. This
insights, which are easily expressed within our hint is partly for soundness as some transformations only
infrastructure. Using these hints, we reduced the size of refine the high-level specification; however, there are
some proofs by more than half, which greatly increased also reasons to support the sugar natively. Consider
the readability and maintainability of the proofs as the virtual functions in C++. In theory, we could desugar
specifications evolved. Ultimately, the proofs became these to tables of function pointers, but doing so (even
fairly close to the proof outlines written by experts abstractly) would expose reasoning principles not justi-
because the automation was extended with the expert’s fied by the C++ standard. Further, desugaring the con-
strategy for reasoning. struct would require all developers using the construct
A second instance where hints were able to abstract to reason about the desugaring, something which is
low-level details arose when working with arrays, and clearly undesirable. Supporting the feature directly not
especially with array initialization and destruction. only keeps us closer to the standard but also enables us
For modularity purposes, BRiCk’s semantics describes to build opinionated abstractions and automation for
array initialization compositionally by initializing each the use cases of virtual functions.
array cell sequentially. Although this provides a clear
specification, using this approach becomes quite costly Supporting Sophisticated Language Features. Industrial
when working with large arrays. Providing special hints languages also have features that are difficult to rea-
for default-initializing primitive arrays enabled us to son about. The archetypical difficult language feature
fuse many reasoning steps together, resulting in more in C++ is the object model, which is front and center
natural descriptions of the program state. Our hints can in C++ semantics. Keep in mind that the concurrent
also codify patterns for reasoning about array accesses memory model is another cross-cutting feature, albeit
in a natural way by decomposing (and recomposing) a one more limited in scope, because regular C++ vari-
large array into locations of interest and the rest of the able accesses must be data-race free. Although devel-
array. These sorts of “borrows” constitute a large part of opers often think of C++ pointers as virtual addresses,
the administrative reasoning necessary in low-level C++ the C++ language puts significantly more structure on
programs, and our ability to express these borrowing them. This additional structure gives optimizers the
patterns generically generally means that the automa- ability to reason more aggressively about programs
tion can churn through array reasoning with relatively but comes at the cost of more bookkeeping in formal
little manual intervention. proofs. For example, our semantics tracks pointer prov-
enance to rule out undefined behavior that arises from
Industrial Programming Languages low-level pointer manipulation.
One of the largest sources of complexity in verification Although not pervasive, the interoperation between
is not the code we write but the language in which we C++ and assembly is another necessary part of low-level
write it. C++, and modern programming languages in programming. Beyond just giving a semantics for assem-
general, are both large and complex and provide formal bly, we are forced to answer questions such as, “What
reasoning challenges in and of themselves. Although is the effect of sharing memory between multiple pro-
necessary to address, we note that these challenges grams?” Empirically, we know what compilers do, but

Authorized licensed use limited to: Georgetown University. Downloaded on May 16,2023 at 17:49:10 UTC from IEEE Xplore. Restrictions apply.
www.computer.org/security 39
FORMAL METHODS AT SCALE

the standard text is silent on many of these low-level The UMX Spike
questions. The UMX spike was planned from the top down, from a
To avoid these issues in the short term, we make top-level specification to the implementation, but gen-
judicious, simplifying assumptions that seem to hold in erally completed from the bottom up, from application
practice. For example, BRiCk assumes that deallocating dependencies to the top-level specification. During the
memory does not invalidate the pointer, an assumption initial planning, we identified two high-level compo-
not sanctioned by the standard. This choice requires nents: the control plane, which interacts with clients,
that we add liveness side conditions to certain opera- and data plane, which forwards data.
tions to avoid obvious unsoundnesses. Researchers Rather than splitting the team equally between these
suggest12 that this seems necessary (in C) to support components, we opted to focus first on the data plane
common, low-level programming idioms and is there- and then the control plane. Consolidating resources
fore likely justified by compilers in practice. improved collaboration and resulted in timely and con-
structive feedback. For example, the team was able to
Managing FM Teams identify a key missing abstraction around string liter-
In the past few years at BedRock Systems, we have als early on and developed preliminary automation for
experimented with a few different approaches for man- them that was immediately used by the rest of the team.
aging FM work. In this section, we discuss some of the It is useful to note that the separation of clients and
lessons that we learned. We underscore that the value implementation via a formal specification generally
of FM is directly correlated with its pervasiveness. enables a greater degree of parallelism than is possible
When FM are involved early and regularly throughout in traditional software engineering. Client verification
development, things tend to go smoothly. Delaying FM can start even before an implementation exists and
involvement until the end makes them more difficult to certainly before a proof is completed. We find that our
accurately plan and ultimately prove the code correct. goal of automatable abstractions tends to insulate client
At a high level, our experience suggests that man- code from shallow, specification-level changes, allowing
aging FM teams is not fundamentally different from them to (relatively quickly) adapt proofs when underly-
managing “normal” development teams. At a lower ing interfaces change.
level, we found that focused efforts exercising specifica- The control plane verification did not proceed as
tions from both the client and implementation sides are smoothly as the data plane verification due to unfore-
highly effective at delivering high-quality, reusable, and seen complexities in two components: the service
verified code. We refer to this approach as spike-based library and use of shared memory. The underlying issue
verification because it is built around focused verticals. in both of these stemmed from subtle complexities and
In the next section, we focus on our spike-based ver- insufficient expert bandwidth. This is especially prob-
ification efforts on the UMX, a multithreaded service lematic at external interfaces, where the behavior of cli-
that implements a console multiplexer. ents is largely unconstrained and therefore sometimes
difficult to conceptualize. In these circumstances hav-
ing experts on hand is essential, and even with them, it
is sometimes challenging to estimate the difficulty of a
Client 1 Client 2 task before you are already deep in it.
Although experiences like these do arise, they occur
less frequently than one might think. Bleeding edge
work often comes with risks and slowdowns, but in
Use Use many cases, this is not fundamentally different than
Sketch Revise ... state-of-the-art software engineering. As in that con-
Spec Spec
text, it is crucial to avoid early overgeneralization and
Prove Prove
scope creep. We have found that the combination of
spike-based verification and agile-style sprints helps
with this. Rather than solving problems in a vacuum,
Implementation engineers focus on real use cases attached to real code.
of the Module With a specific use case in mind, a solution can often
take the place of the “perfect” solution and enable for-
Figure 2. The lifecycle of a specification (spec). We verify code in spikes that ward progress (see Figure 2). New use cases (and fresh
address both users and implementers of specified code, iteratively refining
eyes) often inspire new insights that can be used to
specs until they are both realizable and useful. Later clients benefit from
generalize existing specifications. In practice, the proof
usability improvements that occur in previous verification cycles.
burden introduced by generalizing an interface is often

Authorized licensed use limited to: Georgetown University. Downloaded on May 16,2023 at 17:49:10 UTC from IEEE Xplore. Restrictions apply.
40 IEEE Security & Privacy May/June 2022
significantly reduced by “adapter” hints provided to our code bases means building automatable, but also highly
extensible automation. expressive, program logics. BedRock System’s automa-
The scrum approach to FM is highly effective at tion for C++ supports a hybrid of highly automated
transferring skills among developers. Short, daily sync reasoning when possible and deliberate reasoning when
meetings helped to connect more and less experienced necessary. The need for both is essential as programs
developers. The focused scope of the group also greatly grow not only in size but also complexity.
reduced the context switching overhead in collabora- On the nontechnical side, it is crucial to expand pro-
tion and resulted in both faster and better feedback ficiency in FM across the board. Specialized FM experts
across the board. will still be necessary, but we believe that much of the
knowledge required for verification could be made
Hiring and Training for FM accessible to undergraduates. Mainstream interest in
Spike-based formal verification improves onboarding, Rust and the growing popularity of functional program-
but hiring for FM is still difficult. There are few candi- ming languages both suggest that attitudes toward new
dates with general FM expertise and even fewer with technologies are changing in a positive way for FM. In
expertise in specialized areas such as concurrent sepa- the meantime, on-the-job training can compensate for a
ration logic. The universities teaching FM to under- lack of general knowledge.
graduates help mitigate the gap, but we have found that
teaching FM “on the job” is necessary. Even Ph.D.s with References
deep experience in separation logic require training to 1. Y. Bertot and P. Castran, Interactive Theorem Proving
transition from academic FM (which often focus on and Program Development: Coq’Art the Calculus of Induc-
smaller programs and meta-level issues) to industrial tive Constructions. New York, NY, USA: Springer-Verlag,
program verification. Over time, this transition period 2010.
at BedRock Systems has shortened, and we believe that 2. M. S. Tsirkin and C. Huck, “Virtual I/O Device (VIRTIO),”
good training material and exposure through spike-based OASIS Virtual I/O Device (VIRTIO) Technical Commit-
verification will further reduce the overhead. tee, Version 1.1, [Online]. Available: https://docs.oasis-open.
When hiring general software engineers, we have org/virtio/virtio/v1.1/virtio-v1.1.html
found that candidates with functional programming 3. J. Reynolds, “Separation logic: A logic for shared mutable
background are able to pick up FM much more readily data structures,” in Proc. 17th Annu. IEEE Symp. Logic
than those without such exposure. In part, this is attrib- Comput. Sci., 2002, pp. 55–74, doi: 10.1109/LICS.2002.
utable to the fact that Coq is built around a functional 1029817.
programming language (Gallina), but we believe that it is 4. R. Jung et al., “Iris: Monoids and invariants as an
more than that. Functional programming languages tend orthogonal basis for concurrent reasoning,” ACM SIG-
to focus on minimalism, which seems to train developers PLAN Notices, vol. 50, no. 1, pp. 637–650, 2015, doi:
to more quickly separate the core problem from the noise 10.1145/2775051.2676980.
surrounding it. This skill is transferable not only to speci- 5. “BedRock systems/BRiCk.” GitHub. https://github.
fication writing but also to the design of good interfaces. com/bedrocksystems/BRiCk (Accessed: Nov. 30, 2021).
6. C. Lattner and V. Adve, “LLVM: A compilation frame-
work for lifelong program analysis and transformation,”

I ncorporating FM into all aspects of the software


development lifecycle, from software system design
to implementation to code maintenance, has the poten-
in Proc. Int. Symp. Code Generation Optimization, San
Jose, CA, USA, Mar. 2004, pp. 75–86, doi: 10.1109/
CGO.2004.1281665.
tial to revolutionize the software industry. But making 7. Information Technology—Programming Languages—
pervasive FM a reality requires solving deep technical C++, ISO/IEC 14882: 2011.
and nontechnical challenges, many of which we have 8. R. J. Krebbers, “The C standard formalized in Coq,” Ph.D.
begun to address at BedRock Systems. dissertation, Radboud University Nijmegen, Nijmegen,
On the technical side, we see refining language stan- The Netherlands, 2015.
dards and improving automation as crucial barriers 9. T. Ramananandro, G. Dos Reis, and X. Leroy, “Formal
that are beginning to fall. Reasoning about industrial verification of object layout for C++ multiple inheri-
languages such as C++ is necessary but raises difficult tance,” SIGPLAN Notices, vol. 46, no. 1, pp. 67–80, Jan.
problems in semantics, especially around complicated 2011, doi: 10.1145/1925844.1926395.
corners of standards. This is an active area of research, 10. U. Steinberg and B. Kauer, “Nova: A microhypervisor-
and increasingly, standards committees (especially based secure virtualization architecture,” in Proc. 5th Eur.
the C standard committee) are seeing the value in it. Conf. Comput. Syst., Association for Computing Machin-
Engineering verification to scale to complex industrial ery, 2010, pp. 209–222, doi: 10.1145/1755913.1755935.

Authorized licensed use limited to: Georgetown University. Downloaded on May 16,2023 at 17:49:10 UTC from IEEE Xplore. Restrictions apply.
www.computer.org/security 41
FORMAL METHODS AT SCALE

11. R. Krebbers, A. Timany, and L. Birkedal, “Interactive František Farka is a senior formal methods engineer
proofs in higher-order concurrent separation logic,” SIG- at BedRock Systems, Inc., San Mateo, California,
PLAN Notices, vol. 52, no. 1, pp. 205–217, Jan. 2017, doi: 94401, USA. His research interests include logic in
10.1145/3093333.3009855. computer science, type theory, and proof search.
12. P. E. McKenney et al., Pointer Lifetime-End Zap, ISO/IEC Farka received a Ph.D. in computer science from
JTC1/SC22/WG21 P1726R0, 2019. [Online]. Avail- the University of St Andrews and Heriot-Watt
able: http://www.open-std.org/jtc1/sc22/wg21/docs/ University. Contact him at frantisek@bedrocksys
papers/2019/p1726r0.pdf tems.com.

Gregory Malecha is the director of formal methods at Jasper Haag is a formal methods engineer at BedRock
BedRock Systems, Inc., San Mateo, California, 94401, Systems, Inc., San Mateo, California, 94401, USA.
USA. His research interests include formal verifi- His research interests include formal verification.
cation, automation, and programming languages. Haag received a B.S. in computer science from the
Malecha received a Ph.D. in computer science from Massachusetts Institute of Technology. Contact him
Harvard University. Contact him at gregory@bed at [email protected].
rocksystems.com.
Yoichi Hirai is a senior software engineer BedRock
Gordon Stewart is a formal methods lead at BedRock Systems, Inc., San Mateo, California, 94401, USA.
Systems, Inc., San Mateo, California, 94401, USA. His research interests include modal logics for
His research interests include formal verification and knowledge and concurrency. Hirai received
compiler correctness. Stewart received a Ph.D. in a Ph.D. in computer science from the Univer-
computer science from Princeton University. Contact sity of Tokyo. Contact him at yoichi@bedrock
him at [email protected]. systems.com.

Over the Rainbow: 21st Century


Security & Privacy Podcast
Tune in with security leaders of academia,
industry, and government.

Lorrie Cranor

Bob Blakley

Subscribe Today
www.computer.org/over-the-rainbow-podcast
Digital Object Identifier 10.1109/MSEC.2022.3172005

Authorized licensed use limited to: Georgetown University. Downloaded on May 16,2023 at 17:49:10 UTC from IEEE Xplore. Restrictions apply.
42 IEEE Security & Privacy May/June 2022

You might also like