Persistence For The Masses: RRB-Vectors in A Systems Language
Persistence For The Masses: RRB-Vectors in A Systems Language
Relaxed Radix Balanced Trees (RRB-Trees) is one of the latest members in a family of persistent tree based
data-structures that combine wide branching factors with simple and relatively flat structures. Like the battle-
tested immutable sequences of Clojure and Scala, they have effectively constant lookup and updates, good
cache utilization, but also logarithmic concatenation and slicing. Our goal is to bring the benefits of func-
tional data structures to the discipline of systems programming via generic yet efficient immutable vectors
supporting transient batch updates. We describe a C++ implementation that can be integrated in the runtime
of higher level languages with a C core (Lisps like Guile or Racket, but also Python or Ruby), thus widening
the access to these persistent data structures.
In this work we propose (1) an Embedding RRB-Tree (ERRB-Tree) data structure that efficiently stores
arbitrary unboxed types, (2) a technique for implementing tree operations independent of optimizations for a
more compact representation of the tree, (3) a policy-based design to support multiple memory management
and reclamation mechanisms (including automatic garbage collection and reference counting), (4) a model of
transience based on move-semantics and reference counting, and (5) a definition of transience for confluent
meld operations. Combining these techniques a performance comparable to that of mutable arrays can be
achieved in many situations, while using the data structure in a functional way.
CCS Concepts: • Software and its engineering → Data types and structures; Functional languages; •
General and reference → Performance;
Additional Key Words and Phrases: Data Structures, Immutable, Confluently, Persistent, Vectors, Radix-Balanced,
Transients, Memory Management, Garbage Collection, Reference Counting, Design Patterns, Policy-Based
Design, Move Semantics, C++
ACM Reference Format:
Juan Pedro Bolívar Puente. 2017. Persistence for the Masses: RRB-Vectors in a Systems Language. Proc. ACM
Program. Lang. 1, ICFP, Article 16 (September 2017), 28 pages.
https://doi.org/10.1145/3110260
1 INTRODUCTION
Immutability enables safe lock-free communication between concurrent programs. Persistence fa-
cilitates reasoning about change and is a fundamental to higher level reactive and interactive sys-
tems. A growing development community is increasingly interested in these properties, motivated
by the horizontal scaling of our computing power, and the increased expectations that a wider and
more diverse consumer base are putting on user interfaces. Many traditionally object-oriented
programming languages are turning to a multi-paradigm approach embracing core functional pro-
gramming concepts. It is the data structures that enable immutability and persistence at the scale
of real world production systems.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses,
This work
contact theisowner/author(s).
licensed under a Creative Commons Attribution 4.0 International License.
© 2017 Copyright held by the owner/author(s).
2475-1421/2017/9-ART16
https://doi.org/10.1145/3110260
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
16
16:2 J. P. Bolívar
1.1 Challenge
Implementations of persistent data structures exist for various languages, both functional and
otherwise. However, few attempts have been made to implement them in a language without a
managed runtime or pervasive garbage collection. There are good motivations to try though. First,
the systems programming community is adopting many techniques and principles from functional
programming, as shown by Rust [Matsakis and Klock 2014] and the latest developments in recent
C++ standards. Second, a sufficiently general and efficient implementation could be integrated in
the runtime of higher level languages (Lisps like Guile or Racket, but also Python or Ruby come
to mind), allowing a wider audience to enjoy the benefits of these data structures. Doing so poses
various challenges.
(1) Efficient persistent data structures require garbage collection. Without automatic garbage
collection provided by the runtime, a reference counting reclamation scheme may be used.
Doing so efficiently is challenging. Furthermore, when integrated in a runtime, it should be
possible to leverage the garbage collector it provides, if one exists.
(2) Most immutable data structures are designed to store boxed values—i.e. pointers to objects
allocated in the free store.1 Many performance critical applications require embedding the
values in the data structure for further cache locality.
(3) An immutable interface may not always interact well with other components of a language
that is not fundamentally functional. Furthermore, performance oriented systems developers
might want to temporarily escape immutability when implementing transactional batches
of updates to a data structure.
1.2 Contributions
We describe an implementation of RRB-Tree based vectors with transience in C++. We overcome
these challenges making the following contributions:
(1) The Embedding RRB-Tree (ERRB-Tree) data structure that efficiently stores arbitrary un-
boxed types (§ 3).
(2) A tree traversal technique based on mutually recursive higher order position/visitors. It can
be used to implement tree operations independently of optimizations that achieve a more
compact representation of the tree (§ 4).
(3) A policy-based design to support multiple memory management and reclamation mecha-
nisms (including tracing garbage collection and reference counting) (§ 5).
(4) A model of transience based on reference counting (§ 6.2) and move-semantics (§ 6.3). These
optimize updates by sometimes performing them in-place even when the containers are
used in a functional style, making the performance profile is depend on the actual dynamic
amount of persistence in the system.
(5) A definition of transience for all RRB-tree operations, including confluent meld operations (§ 6.4).
(6) An evaluation of the state of the art of Radix Balanced vectors, comparing it to various
configurations of our implementation. This includes a discussion of the effects of reference
counting, challenging the established assumptions on its suitability for the implementation
of immutable data structures (§ 7).
Our implementation is libre software and freely available online.2
1 Some implementations do embed basic numeric types, but since they have sizes similar to that of a pointer, they do not
need to adapt the core algorithms.
2 Immer: https://sinusoid.es/immer
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
Persistence for the Masses: RRB-Vectors in a Systems Language 16:3
2 BACKGROUND
2.1 Radix Balanced Trees
Radix Balanced Trees are inspired by Array Mapped Tries [Bagwell 2000] and were first introduced
in the implementation of immutable vectors in Clojure [Hickey 2008]. They provide a good balance
between structural sharing, random access complexity, and efficient cache utilization.
their data model persistent. That talk has drawn the attention of a big part of the C++ community towards immutability
and value semantics, and is one source of inspiration for this work. https://channel9.msdn.com/Events/GoingNative/2013/
Inheritance-Is-The-Base-Class-of-Evil
8 Steady: https://github.com/marcusz/steady, Immutable++ https://github.com/rsms/immutable-cpp
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
16:4 J. P. Bolívar
size 23
shift 4
root
index: 9 = 00 10 01
0 1 2 3
0 1 2 3 0 1 2 3
'a'
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
'h' 'o' 'l' 'a' '␣' 'h' 'a' 'b' 'i' 'b' ',' '␣' 'w' 'i' 'e' '␣' 'g' 'e' 'h' 't' ''' 's' '?'
Fig. 1. A B = 2 Radix Balanced Tree containing the 23 element string: "hola␣habib,␣wie␣geht’s?". The
access path down to index 9 is shown in red.
As illustrated in figure 1, this is a tree structure where every inner node and leaf has M = 2B
slots (branches or elements respectively), where B are the branching bits that characterize the tree.
The rightmost path of the tree may contain nodes with less than M slots. Every other node has
exactly M slots and is considered full.
2.1.1 Radix Search. We may locate the vector element at index i in a radix balanced tree t by
traversing down the tree, where for every subtree t ′ with height h(t ′), we descend to its child at
⌊ h(t ′) ⌋
offset i/M mod M (radix search). Since M is a power of two, we can simply divide the vector
element index in h(t) groups of B bits and use each to navigate at each level of the tree. A common
optimization is to keep track of the value shift(t) = B × h(t) by storing it at the root. This denotes
the depth of the tree while avoiding multiplications in tree traversals, using comparatively cheap
bit-wise operations instead. Interestingly, an indexing mechanism very similar to radix search is
commonly used to map virtual memory addresses to hardware addresses using multi level page
tables [Drepper 2008].9
A Radix Balanced Tree can also be considered a trie. If we think of numbers represented in base
M as strings with an alphabet of size M, an immutable vector of size n is a trie containing every
key in the range [0, n) if we add left padding to the keys so they are evenly sized. This trie has a
depth h(T ) = logM (n) and thus lookup runs in logarithmic time.
2.1.2 Branching Factor. We partially reproduced the results from [Bagwell and Rompf 2011;
Hickey 2008; L’orange 2014; Stucki et al. 2015] claiming that M = 32 (B = 5) is a sensible choice
on modern architectures.10 With such high branching factor elements are stored in contiguous
blocks spanning a few cache lines and thus the CPU caches are used effectively and the structure
can be iterated fast. Furthermore, such tree containing 232 has only depth of 7, this is, it contains
every element that is addressable using 32 bit integers yet search elements in only 7 steps. When
dealing with lots of data, other factors (like working set size vs cache size) have an impact orders
of magnitude larger than the depth of the tree. For this reason, in practice, it is useful to think
9 In fact, the way Linux implements virtual memory via copy-on-write can be thought of as massive persistent vector,
where memory pages are leaf nodes, and page tables are inner nodes, and each fork() creates a new version of the data
structure. So we have the operating system managing memory as massive persistent vectors at one end, and at the other
end language users build their programs out of small persistent vectors and hash-tables. We could fill the sandwich adding
a persistent vector based memory allocator/garbage collector, fulfilling the wild fantasies of the authors—maybe a future
work proposition?
10 Informal experiments using our implementation show that the values 4, 5 or 6 all sensible, having different impact on
different operations. The optimal choice depends on the particular hardware, workload, and memory management strategy.
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
Persistence for the Masses: RRB-Vectors in a Systems Language 16:5
size 23
shift 4
root index: 19 = 01 00 11
19 - 16 = 3 = 00 11
0 1 2 3
16 23
3 - 2 = 1 = 01
0 1 2 3 0 1 2 3
2 5 7
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
'h' 'o' 'l' 'a' '␣' 'h' 'a' 'b' 'i' 'b' ',' '␣' 'w' 'i' 'e' '␣' 'g' 'e' 'h' 't' ''' 's' '_'
Fig. 2. A B = 2 RRB-Tree containing the 23 element string: "hola␣habib,␣wie␣geht’s?". The access path
down to index 19 is shown in red. At level 2, an extra step is taken in order to find the right subtree.
of radix search complexity as constant. As such, it is often advertised that Radix Balanced based
vectors support effectively constant random access.
2.3 Optimizations
2.3.1 Transience. By default, persistence is achieved in RRB-Trees via immutability. Because
data is never modified in-place, updates involve copying the whole path down to the updated
element. While this is desired most of the time, it is overkill when a function produces many
intermediate versions that are immediately discarded.
Clojure proposes a pair of polymorphic functions (transient v) and (persistent! t). The
first one returns a mutable (i.e. transient) version of the immutable collection v, for which mutat-
ing operations do exist. The second returns an immutable snapshot of the transient collection t
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
16:6 J. P. Bolívar
size 23
shift 4 index: 22 = 01 01 10
tail
22 - 21 = 1 = 01
root
0 1 2 3
16 21
0 1 2 3 0 1 2 3
2 5
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
'h' 'o' 'l' 'a' '␣' 'h' 'a' 'b' 'i' 'b' ',' '␣' 'w' 'i' 'e' '␣' 'g' 'e' 'h' 't' ''' 's' '_'
Fig. 3. A B = 2 RRB-Tree with off-tree tail containing the 23 element string: "hola␣habib,␣wie␣geht’s?".
The access path down to index 22 is shown in red. Note that since the element lies in the tail, the tree does
not need to be traversed.
and invalidates t. Both operations are O(1). Updates on a transient collection are done in-place
whenever possible, doing copy-on-write to ensure the immutability of the adopted contents. We
further discuss transience in § 6.
2.3.2 Off-tree Tail. When using immutable vectors, users may expect fast appends, as it is the
case for mutable vectors. The most common optimization when implementing Radix Balanced
Trees is to keep the rightmost leaf off-tree and point to it directly from the root, as shown in
figure 3. In most cases only the tail needs to be updated to append a new element. Once every M
appends the tail gets full—it is inserted into the tree and a new tail is created. Appends become
significantly faster this way and our implementation includes this optimization.
2.3.3 Left Shadow. We saw that the rightmost path may contain nodes with less than M slots. A
way to think about this is to consider that every Radix Balanced Tree t does virtually contain M h(t )
elements. When the size of the vector is not a power of M, the rightmost M h(t ) − size elements
are null. The representation is compressed by not storing all these empty rightmost branches in
memory. This is the right shadow of the tree.
Instead of projecting our vector elements in the indices [0, size) of the tree, we could as well use
some other [f irst, last) range, thus creating another shadow on the left. In this way, it is possible to
implement effectively O(1) drop and prepends with regular Radix Balanced Trees. This optimiza-
tion is implemented in Scala [Stucki et al. 2015]. We chose not to implement this optimization, but
we are considering it for future work.
2.3.4 Display. The display is a generalization of the off-tree tail mechanism that leverages the
spatiotemporal locality of updates. The core idea is to establish a vector element as the focus. The
whole path down to this index (the display) is kept off-tree and stored directly at the root. It was
found that by using an exclusive-or (xor) operation between two indexes it is easy to find their
common ancestor in the tree. Updates close to the focus become faster because only the sub-path
after the common ancestor needs to be updated. Vector manipulations change the focus to the
index where the update occurs. In this way, sequential updates become amortized O(1). We experi-
mented with this optimization for a while, but decided not to include it in our final implementation,
because:
(1) The root node becomes bigger because it stores a whole path. This means that if we store
it on the stack or unboxed in some container, it becomes more expensive to copy it: both
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
Persistence for the Masses: RRB-Vectors in a Systems Language 16:7
because of the bigger size, but also because the reference counts of all nodes in the display
need to be touched.
(2) Implementation complexity is increased. Some assumptions are invalidated (e.g. now some
references inside the tree are null when they belong to the display) adding further condi-
tional checks and overhead to operations that do not benefit from the display.
(3) Transients are an alternative way of improving the peformance of sequential updates. While
less general and pure, they provide better performance when applicable (§ 7.2.4).
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
16:8 J. P. Bolívar
size 11
shift 3
root
index: 9 = 01 00 1
0 1 2 3
0 1 2 3 0 1 2 3
0 1 0 1 0 1 0 1 0 1 0 1
'h' 'o' 'l' 'a' '␣' 'h' 'a' 'b' 'i' 'b' '!'
0 1 2 3 4 5 6 7 8 9 10
Fig. 4. A B = 2, Bl = 1 ERB-Tree containing the 11 element string: "hola␣habib!". The access path down
to index 9 is shown in red. Every element is twice the size of a pointer (we could imagine we are storing a
UTF-32 string in a 16-bit architecture) and the contained objects are embedded directly in the leaves.
Under the new structure, looking for vector element at index i, the offset in the leaf array is
⌊i/Ml ⌋. Thus, the shift(t) is now defined as:
{
Bl + B × (h(t) − 1) if h(t) > 0
shift(t) = (1)
0 if h(t) = 0
Special care has to be taken to accomodate the base case of recursive tree traversals to this new
shift definition. Most algorithms are otherwise very similar to those of the original RRB-Trees.
Listings 3.1a and 3.1b compare potential C++ definitions of naive and embedding RRB-Trees and
their respective random access operations.
3.3.2 Choosing Bl . The question remains: what are the best values for B and Bl ? Intuitively, we
expect the answer to depend on sizeof T . Experimentally we can show that B = 5 remains valid,
but Bl should be chosen such that leaf nodes are similar in size to inner nodes. The size of inner
nodes depends only on B, since the size of a data pointer (sizeof ∗ ) is usually fixed for a given CPU
architecture. For a given B and T , we may derive the branching bits at the leaves as:
sizeof ∗ × 2B
⌊ ( )⌋
Bl′ (B,T ) = log2 (2)
sizeof T
Note that the choice to floor the result might seem rather arbitrary. This ensures that sizeof T ×
2Bl ≤ sizeof ∗ × 2B , this is, that leaf nodes are at most as big as inner nodes. This property enables
reusing buffers used to store all kinds of vector nodes across different value types (§ 5.3).
Also, we see that Bl′ = 0 when sizeof T > sizeof ∗ × 2B−1 . In this case, leaf nodes contain only
one element and no bits are used to address into it. Such ERRB-Tree is equivalent to a RRB-Tree
with the same B (off-tail optimizations not considered).
In our C++ implementation, both B and Bl can be customized by passing template arguments.
This allows the user to optimize the data structure for architectures not considered by the authors,
or to reproduce the results here described. Otherwise, they default to B = 5 and Bl = Bl′ (B,T ). The
latter is derived at compile time when the containers is instantiated for a given element type T .
4 MINIMIZING METADATA
4.1 Incidental Metadata
A naive implementation of RRB-Trees stores the following two pieces of metadata in every node:
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
Persistence for the Masses: RRB-Vectors in a Systems Language 16:9
template <int B> constexpr auto M = 1u << B; template <int B> constexpr auto M = 1u << B;
template <int B> constexpr auto mask = M<B> - 1; template <int B> constexpr auto mask = M<B> - 1;
private: private:
union node { union node {
node* inner[M<B>]; node* inner[M<B>];
T leaf[M<B>]; T leaf[M<BL>];
}; };
const T& get_(size_t i, const node* n, int s) const { const T& get_(size_t i, const node* n, int s) const {
return s == 0 return s == BL - B
? n->leaf[i & mask<B>] ? n->leaf[i & mask<BL>]
: get_(i, n->inner[(i >> s) & mask<B>], s - B); : get_(i, n->inner[(i >> s) & mask<B>], s - B);
} }
}; };
(a) Naive Radix Balanced Tree (b) Embedding Radix Balanced Tree
(1) The type of node. This is, whether it is a leaf or inner node (regular or relaxed). This may
happen implicitly—i.e. it is added by the compiler when using runtime polymorphism, in the
form of a pointer to a v-table, type tag, or alike.
(2) The number of slots in the node. In many languages this is added implicitly too—for example,
Java arrays provide an length member.13
13 Whether this adds an overhead depends on the implementation. Very often, the free store needs to know the size of the
node anyways, but this information is lost in abstraction in languages like C or C++.
14 Has a tiny advantage for some operations, a tiny disadvantage for others.
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
16:10 J. P. Bolívar
no generic reference counting mechanism can be used.15 However, by doing reference counting
manually as part of the algorithms, we achieve further performance gains by avoiding redundant
operations.
4.3.2 Slot Counts. Most implementations still do store the number of slots either directly or
indirectly. In most cases, they use the array length. When transients are supported, mutable nodes
keep room for extra slots, so the length attribute is not accurate. In that case, implementations
may set the extra slots to null. But both the array length and null markers are redundant to the
extent that we can derive the number of slots in a node from other information.
Remember that following radix search (§ 2.1.1), we may define a function that, for a given height
h, computes the offset of the slot containing the vector index i:
⌊ ⌋
i
offset(i, h) = mod M (3)
Mh
In a regular tree of size s > 0, we can then derive the number of slots in a node t at height h(t)
that contains the vector index i:
{
M if i < M h(t ) × offset(s − 1, h(t) + 1))
slots(t) = (4)
offset(s − 1, h(t)) + 1 otherwise
This is, when a node is in the rightmost path its number of slots is the offset past the last element
of the vector, otherwise the node is full. This computation is slow in comparison to just querying
the number of elements for the array. But it can be implemented efficiently by leveraging the
contextual information we have during the traversal. For example, in a push_back() operation that
appends a new element we know that we are traversing the rightmost branch. Thus, no conditional
is needed and the slot count can be computed using fast bitwise operations. For operations that
traverse the tree towards arbitrary indexes of a vector, such an update(i, v) that changes the
value of the i-th element to v, we can separate the traversal of full and rightmost nodes as in
listing 4.1.
Listing 4.1. An update() ERB-Tree operation without type or slot count metadata
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
Persistence for the Masses: RRB-Vectors in a Systems Language 16:11
While efficient, this code suffers from significant duplication. The do_update and do_update_full
functions are identical excepting for (1) the computation for the slot count, and (2) the the recur-
sive call down to the next child. The problem gets worse when we introduce relaxed nodes in
RRB-Trees and binary operations like concatenation. To solve this problem we introduce the no-
tion of positions.
template <typename Pos, typename T> template <typename Pos, typename T>
auto visit_regular(update_op op, Pos pos, size_t i, T v) { auto visit_leaf(update_op op, Pos pos, size_t i, T v) {
auto newn = copy_regular(pos.node, pos.count()); auto newn = copy_leaf(pos.node, pos.count());
newn->inner[pos.offset(i)] = pos.towards(op, i, v); newn->inner[pos.offset(i)] = v;
return newn; return newn;
} }
All the bit wizardry is hidden in the positions, and the traversal can be optimized without chang-
ing the visitor. The operation is not concerned anymore with the details of how to navigate through
the tree. Instead, it focuses on what it needs to do to each node in order to produce a new structure.
No combinatorial explosion happens between the types of nodes and the types of positions. When
implemented in this way, changing an RRB-Tree into an ERRB-Tree structure is simple.
This code is also as performant as the hand-written traversal without positions. Note that all
dispatching is done statically. The recursive visitor takes the positions as a template argument, and
it is instantiated at compile time for all possible positions in the traversal. When instantiating the
visitor in listing 4.2 against the position framework in appendix A, a call graph as shown in figure 5
is produced. All definitions are visible to the compiler allowing inlining and other optimizations.
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
16:12 J. P. Bolívar
update_relaxed
<relaxed_pos>
update_regular update_regular
<regular_full_pos> <regular_pos>
update_leaf update_leaf
<leaf_full_pos> <leaf_pos>
Fig. 5. Instantiated call graph for the update() operation using positions over an (E)RRB-Tree.
5 MEMORY MANAGEMENT
A garbage collection mechanism is required for immutable persistent data structures to be effective.
As shown, when an operation updates the data structure, new nodes are allocated to copy the
changed parts of the data structure, but the new value also references parts that did not change.
Eventually, when old values are not used anymore, a node may loose any references to it and its
memory should be recycled.
16 Workarounds exist for this problem. With libgc one can use GC_malloc_uncollectable instead of std::malloc where
memory is managed manually.
17 http://docs.racket-lang.org/inside/im_memoryalloc.html
18 https://docs.python.org/3/extending/
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
Persistence for the Masses: RRB-Vectors in a Systems Language 16:13
objects19 . It is sometimes thought of as a compile-time version of the strategy pattern [Gamma et al.
1995]. A good factoring into policies enables extensibility without a performance penalty and it is
thus useful to enable the user make their own trade-offs when configuring performance sensitive
aspects of some type. Listing 5.1 shows two policies for reference counting as implemented in our
system.
struct refcount_policy {
mutable std::atomic<int> rc{1};
void inc() struct no_refcount_policy {
{ rc.fetch_add(1, std::memory_order_relaxed); } void inc() {}
bool dec() bool dec() { return false; }
{ return 1 == rc.fetch_sub(1, std::memory_order_acq_rel); } void dec_unsafe() {}
void dec_unsafe() bool unique()
{ rc.fetch_sub(1, std::memory_order_relaxed); } { return false; }
bool unique() const };
{ return rc.load(std::memory_order_acquire) == 1; }
};
Listing 5.1. Two reference counting policies. The refcount_policy enables thread-safe reference counting
via an atomic integer count. The no_reference_counting is a no-op policy to be used when some other
garbage collection is available. Our system provides an additional unsafe_refcount_policy for single-
threaded systems.
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
16:14 J. P. Bolívar
Fig. 6. Inner node layouts. Regular inner nodes do not store the size of the subtrees, thus the relx pointer is
null. A relaxed node may or may not have the size array allocated in the same memory object. In the former
case, the size array needs its own meta to track reference counts or transient ownership. In the latter case,
the size array can not be shared between nodes, making the update() slower, but improving traversals and
reducing allocations in other update operations.
6 TRANSIENCE
6.1 Background
While RRB-Trees perform very well for a persistent data structure, they are suboptimal when per-
sistence is not required. This happens when writing pure functions that perform multiple updates
to a data structure but only return the last version. From the point of the view of the caller, the
function is a transaction and we can only observe the accumulated effects of the whole opera-
tion. We may persist its inputs and outputs, but intermediate results produced inside the function
are necessarily forgotten when it returns. An example of such suboptimal operation is shown in
listing 6.1a, which defines a iota(v, f, l) that appends all integers in the range [f , l) to the
immutable vector v.
Clojure solves this problem by introducing the notion of transients. A transient container can be
constructed in O(1) from its persistent counterpart by merely copying a reference to its internal
data. However, the transient has different semantics, such that update operations invalidate the
container supplied as argument. This allows transient operations to sometimes update the data-
structure in-place, stealing the passed in tree and mutating its node objects. Still, the transient
has to preserve the immutability of the original persistent value whose contents it adopted at the
beginning of the transaction.
6.1.1 Copy on Write. To be able to track which nodes it can mutate in place, the transient is
associated with a globally unique identifier that is generated at the beginning of the transaction.
We then proceed using a copy-on-write strategy. In a transient update operation, before mutating
a node in place, we check whether the node is tagged with the current transient identifier—i.e. we
check if the transient owns the node. Only if it owns the node is it allowed to modify the node
in-place—otherwise a new copy is made, tagged with the transient identifier. Update operations
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
Persistence for the Masses: RRB-Vectors in a Systems Language 16:15
do this for every node in the path down to the affected leaves, thus making sure the mutations
have no visible effects outside of the transient. A transient can be converted back into a persistent
value, invalidating the transient. Thanks to this invalidation, the operation can be performed in
O(1) without cleaning up the tags—a new transient is going to have a different identifier, so it is
impossible for newer transients to mutate those nodes that are remain tagged with the identifier of
the finished transient. In our system, the transience policies (see appendix B for an implementation)
describe how these identifiers are created and compared.
6.1.2 Interface. In Clojure, transient operations are not referentially transparent, even though
functions that only use them internally can be so.20 However, transients can be modelled in a pure
language supporting affine types [L’orange 2014; Walker 2005].
In C++ it feels natural to encapsulate transients behind an explicitly mutable interface as exem-
plified in listing 6.1b. This makes the invalidation of the previous state more obvious, hopefully
lowering chances for programming mistakes. Also this interface is compatible with generic stan-
dard library algorithms and components which expect mutable containers. In this way, persistent
vectors become a first class citizen of the language. In section § 6.3 we present an alternative
interface that does not sacrifice a functional style.
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
16:16 J. P. Bolívar
Algorithm 6.1. Structure of a transient operation. The unique(node) expression tests whether a node’s refer-
ence count is 1, and owner (node) returns the identifier associated to the transient owning the node.
the passed in objects need to be deeply copied when deriving new results from the passed in val-
ues, the T&& moves the passed in contents into the results, hence the name move-semantics for this
general approach.
A truly anonymous r-value could in theory only be used once, thus forming a special kind of
affine type. Affine types where used in the Rust programming language to model fully type safe
move-semantics that unlike C++ are not based in r-value references [Matsakis and Klock 2014].
In C++ r-values references are not truly unique aliases because (1) the destructor is uncondi-
tionally called at the end of its lifetime, even after it was moved, and (2) the programmer can also
cast a named variable (an l-value) to an r-value using the std::move() function. Even though an
object is left in an unspecified state after being moved from, it is valid in the sense that the de-
structor must still succeed. The compiler will not check that a named object that has been moved
from is not used again. Furthermore, it is common to define the assignment operator such that
moved-from objects can be reassigned to give them a specified and known state.
6.3.2 R-value Transients. These semantics allow us to say that an r-value of a persistent con-
tainer is a transient r-value. For every operation in the container, an overload for r-value references
is provided that optimizes updates using the transient rules. Listing 6.2 show two examples where
transient r-values are manipulated.
Listing 6.2. Manipulating r-value transients. On the left, every push_back call is applied to temporaries and
thus the transient rules are used to perform in-place updates without any intervention from the programmer.
On the right, by explicitly moving v into the push_back call, this implementation of iota has the same
performance as that in listing 6.1b.
The applicability of this technique is not exclusive to C++. Notably, Rust is a systems program-
ming language that includes type safe move semantics (borrowing) modelled after affine types, but
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
Persistence for the Masses: RRB-Vectors in a Systems Language 16:17
its lack of overloading means that the programmer would always need to explicitly note which ver-
sion (transient or persistent) of an operation to pick—even though the compiler would most often
protect them from picking the wrong one. We dream of a language that provides the transparency
of overloaded r-value references (like C++), while removing the burden of explicitly moving vari-
ables when they are last used by using stronger and more type safe scoping rules (like Rust).
6.3.3 Ownership Tracking. Since l-values can be converted into r-values, we still need to track
node ownership to ensure we do not modify aliased nodes. When using identifier based ownership
tracking, we would need to generate a new identifier whenever we alias a persistent vector (in
C++ terms, when copy it). This means that in-place modifications would happen only from the
second operation that is applied consecutively on a r-value transient chain. Only programs with
lots of batch updates would benefit from this—for most programs, we speculate that the cost of
generating new identifiers all the time out-weights the potential gains.
However, reference counting keeps an accurate count on the aliasing of a node all the time
anyways. No extra runtime cost is required to optimize updates on r-values, with potentially very
high gains. Our implementation uses r-value transience whenever reference counting is enabled.
This can be controlled by the user via the use_transient_rvalues switch of the memory_policy
(§ 5.3).
6.3.4 Costs. To enable in-place push back, the rightmost path of the tree needs to keep enough
space allocated in the nodes to write the new data. In the worst case, this has a M ×logM (n) memory
overhead. While this is negligible for non-trivial vectors, it might have an impact on an application
that uses lots of small vectors where n ≪ M. A solution could be to use an exponential growth
mechanism for rightmost nodes, similar to that used for non persistent vectors. Sometimes the
programmer may as well juse use immutable arrays when n < M.
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
16:18 J. P. Bolívar
we can define the following three corresponding transient operations when combined with its
transient α τ!
⊕l! : α τ! × α τ → α τ!
⊕r! : α τ × α τ! → α τ! (6)
⊕b! : α τ! × α τ! → α τ!
These are produced by taking a transient in either or both arguments. These operations naturally
map to an r-value based interface, but it can also be modelled using mutable l-values, as shown
in listing 6.3. Note that in the l-value version the last two methods are fundamentally the same,
provided for symmetry.
struct vector_transient {
struct vector {
// ...
// ...
void append(vector const& rhs);
friend vector&& operator+(vector const& lhs, vector&& rhs);
void prepend(vector const& rhs);
friend vector&& operator+(vector&& lhs, vector const& rhs);
void append(vector_transient&& rhs);
friend vector&& operator+(vector&& lhs, vector&& rhs);
void prepend(vector_transient&& rhs);
};
};
6.4.3 Transient Concatenation. Implementing transient concatenation does not require one to
implement the three meld transient functions separately. The trick is to carry three owner identi-
fiers, that of the left side, that of the right side, and that of the new center nodes. When a node is
created or adopted it is assigned the center identifier, which is the identifier of the resulting tree.
The center identifier should be that of the left tree for ⊕l! and that of the right tree for ⊕r! . For
⊕b! , we can choose either side based on some heuristic—we propose taking that of the bigger tree.
Algorithm 6.2 provides a general skeleton for transient meld operations.
Note that transient concatenation is only barely faster than normal concatenation. The meat of
the algorithm happens in the rebalancing step. In the transient version, objects are reused when
possible (saving allocations) but the data needs to be moved or copied around anyways. In order to
reuse nodes we add complexity (more branching, recursion parameteres, etc.) that expends most
cycles that we save by avoiding allocations.
Furthermore, implementing this operation with manual reference counting is particularly tricky
because we destroy the old tree as we go, making it hard to keep an account of which nodes to
inc() or dec(). Note that with reference counting we can instead keep a free-list of recently
deallocated objects (and we do with the default memory policy)—we save allocations yet keep a
simpler recursion, often being faster. Also, with reference counting, the transient concatenation
algorithm is not required to keep an account of transient ownership. Thus, when using reference
counting, our implementation concatenates small trees in-place when only the tail is affected, but
resorts to the persistent concatenation algorithm for the general case.
However, we do enable transient concatenation when automatic garbage collection and id-based
ownership are used. Otherwise we would not tag the produced and adopted nodes with the iden-
tifier of the new owning transient. Failing to do so pessimizes updates later performed on the
resulting transient value.
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
Persistence for the Masses: RRB-Vectors in a Systems Language 16:19
7 EVALUATION
7.1 Methodology
We evaluated various implementations (table 1) by running several benchmarks in a specific sys-
tem (tables 2 and 3). We run each benchmark for three problem sizes N (normally, this is the size
of the vector). For practical reasons, we take less samples of bigger problem sizes (table 4).
We run C/C++ benchmarks using the Nonius framework,22 Clojure benchmarks using Criterium,23
Python using PyTest.Benchmark,24 and Scala using ScalaMeter.25 The two first frameworks are
based on the Haskell Criterion framework26 which introduces interesting statistical bootstrap-
ping methods for the detection of outliers. The rest also do some form of outlier detection. All of
them do appropriate measurement of the clock precision and run each benchmark enough times
per sample to obtain significant results. The JVM based frameworks take care of minimizing the
impact of the garbage collector in the measurements, as well as ensuring that the code is properly
JIT-compiled [Georges et al. 2007]. In C++, we manually trigger a full libgc collection before each
benchmark to avoid remaining garbage from previous benchmarks impacting the performance. We
also considered disabling the garbage collector during the measurement, but this is impracticable
for big problem sizes.
7.2 Results
The benchmark results are presented in tables 5 to 8.
22 https://nonius.io/
23 https://github.com/hugoduncan/criterium/
24 https://github.com/ionelmc/pytest-benchmark
25 http://scalameter.github.io/
26 http://hackage.haskell.org/package/criterion
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
16:20 J. P. Bolívar
ours/gc Our ERRB-Tree implementation with tracing garbage collection using libgc.
ours/safe Our ERRB-Tree implementation with thread safe reference counting using atomic counters. Memory
is allocated using standard malloc, stacked under a global free list of up to 1024 objects (using lock-
free synchronization with atomic pointers), stacked under a thread local free list of up to 1024 objects
(not synchronized). In some benchmarks we disable the free list and name it ours/basic.
ours/unsafe Our ERRB-Tree implementation with thread unsafe reference counting. Memory is allocated using
malloc, stacked under a global free list of up to 1024 objects (not synchronized).
ours/py Python bindings for our ERRB-Tree implementation written directly against the Python C interfacea .
These provide a Python Vector class able to hold any dynamically typed Python object. It uses
Py_Malloc to allocate internal nodes, thread unsafe reference counting to track these, and collabo-
rates with the Python garbage collector to trace the contained objects.
librrb The C implementationd by L’orange [2014] (librrb). It uses libgc for garbage collection and con-
tains boxed objects. In our benchmarks, when storing integers, we just reinterpret these as pointers
to store them unboxed—this should give more comparable results.
clojure Clojure standard vectors implementing tail optimized RB-Trees (clojure). It is written in Java.
clojure.rrb The Clojure implementation of RRB-Treese . It is written in Clojure.
scala Scala standard vectors implementing RB-Trees with display.
scala.rrb Scala implementationf of RRB-Trees with display by Stucki et al. [2015].
pyrsystent A Python implementationg of RB-Trees. They provide both an implementation in C by default, but
also an implementation in Python for systems where C modules are not available.
a We also have experimented with using the C++ frameworks Boost.Pythonb and PyBind11c but these add too much
Processor Intel Core i5-460M (64bit) OS Linux 4.9.0 (Debian) Size N Samples
Frequency 2.53 GHz Compiler gcc 6.3.0
Small (S) 103 100
L1 Cache (per core) 2 × 32 KB (8-way assoc.) Java openjdk 1.8.0-121
Medium (M) 105 20
L2 Cache (per core) 256 KB (8-way assoc.) Python cpython 2.7.3
Large (L) 107 3
L3 Cache 3MB (12-way assoc.) Scala scala 2.11.11
RAM 4 GB DDR3 (1,066 MHz) Clojure clojure 1.8.0
7.2.1 Abstraction Cost. Our implementation uses various abstraction mechanisms (§ 4.4, § 5.2).
We argued that these are zero-cost abstractions—or may even incur a negative cost when used
to remove metadata. We can evaluate this by comparing our implementation with librrb and
pyrsistent, since both are written in C using similar optimizations (off-tree tail). Our Python
bindings are faster than pyrsistent in all benchmarks. Our implementation (when combined
with libgc) is faster than librrb in most benchmarks excepting two.
librrb supports faster transient random updates (table 7) and shows a speedup of around 20%
because their implementation does not support exceptions (it is plain C after all) nor recovers
gracefully from memory exhaustion. Their update function is thus just a simple loop while, in
order to be exception safe, our implementation uses non-tail recursion, executing all potentially
failing operations first and only then mutating the tree.
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
Persistence for the Masses: RRB-Vectors in a Systems Language 16:21
Table 5. Access benchmarks. The sum of all val- Table 7. Update benchmarks. Every element in a
ues in a n element vector is computed. Elements n element vector is updated using sequential in-
are accessed either by sequential indexes (e.g. via dexes. In the transient version, a mutable interface
operator[]), iterators, or via internal iteration (i.e. is used without destroying the initial value.
higher order reduce function).
µs | S 100µs | M 10ms | L
10ns | S µs | M 100µs | L
persistent
ours 595 828 851 ours/basic 459 1385 2269
relaxed ours 497 930 1573 ours/safe 327 1200 2004
librrb 466 905 1268 ours/unsafe 76 350 667
indexing
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
16:22 J. P. Bolívar
Table 8 (L column) shows an example in which librrb does faster concatenation, while our im-
plementation seems faster in all others. Note that we are comparing vectors of normal int values.
These are 32 bit in size while pointers are 64 bit wide, thus Bl = 6 because of embedding. The re-
sulting relaxed structures are not the same and in this particular instance we happen to need more
rebalancing at the leaves. When repeating the benchmark controlling for Bl , our implementation
provides a consistent 50% speedup because it uses positions to avoid allocating auxiliary center
nodes.
7.2.2 Abstraction Suitability. One of our main goals was to offer sufficiently customizable mem-
ory management such that our implementation could be integrated in other language runtimes.
In a few hours we had an initial integration with the Python, supporting cooperative garbage col-
lection and allocating memory in the idiomatic ways suggested by the interpreter documentation.
These bindings are already faster than pyrsistent, a C implementation manually tailored against
the Python interpreter that implements similar optimizations: it keeps the tail off-tree, uses single
threaded reference counting, and it keeps a free list of recently released nodes.
7.2.3 Embedding Effectiveness. While embedding provides some advantage in most operations,
its benefits are most evident when accessing the elements. In our implementation we provide three
ways of accessing the elements: (1) querying an element by index, (2) using iterators (3) using
reduce (i.e. folding). The three methods are compared in table 5.
When using the reduce method, our implementation is only 50% slower than a fully contiguous
std::vector. This is an excellent result, considering how efficient contiguous arrays are. That
method uses internal iteration—this is, it traverses the data-structure once, applying a given oper-
ation on the elements. Using templates, the operation can be inlined in the traversal.
External iteration adds some overhead. At every step the iterator needs to check if it is done
running through the current leaf, and when it is, it needs to traverse down the tree to find the
next leaf. The problem of external iteration over hierarchical data structures is further discussed
by Austern [2000]. Still, thanks to embedding, iterating over an ERRB-Vector is a few times faster
than iterating over a std::list.
Comparing the access performance across languages is hard to do in a fair way, because other
languages have unrelated costs due to some other features. For example, Clojure’s dynamism taxes
dealing with basic types27 . Still, in spite of the best efforts of the JVM JIT, folding C++ EERB-Trees
seems orders of magnitude faster than folding in any of the Java based languages, and random
access is still about twice as fast.
7.2.4 Transience Effectiveness. Looking at tables 7 and 6 we see that transient updates are an
order of magnitude faster than their immutable counterparts.
In fact, for large data sets appends are even faster in a transient ERRB-Vector than in a standard
mutable vector (table 6, column L). We believe that this is due to cache utilization. Even though
a standard mutable vector uses exponential growth to support amortized O(1) appends, in the
growth step it needs to copy all the data. When the vector does not fit in the cache it needs to load
it from main memory. RRB-Vector appends only touches the rightmost path of the tree, which fits
in L1 cache even for huge vectors.
Mutating transient operations are still an order of magnitude slower than on a std::vector. In
this case, the RRB-Tree update cost is dominated by the lookup. Note that we deliberately tested
ordered updates to put our implementation against the wall. We saw in informal experiments that
when performing the updates in random order, a mutable vector slows down an order of magnitude
27 Clojure supports monomorphic vectors for basic types using (vector-of :type). We tried that method and, surprisingly,
found it to actually be slower than generic vectors. Thus we decided not to include it in the presented data
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
Persistence for the Masses: RRB-Vectors in a Systems Language 16:23
for big data sets. Another way to bring RRB transient updates closer to those on mutable purely
sequential data would be to provide mutable iterators.
Note that while Scala updates are very efficient compared to immutable updates on other im-
plementations. This is because the display optimization ( § 2.3.4) achieves amortized O(1) updates
when applied sequentially. However, the display management adds some cost to other operations.
Transient updates, although less general, are still significantly faster and can optimize non-local
updates.
7.2.5 Considering Reference Counting. While reference counting is the most convenient garbage
collection for the C++ developer, they are believed to be inadequate for the implementation of im-
mutable data structures. For arbitrarily deep data structures (e.g. lists), they may overflow the
stack when releasing a big object. This is not a problem for RRB-Trees. But an RRB update touches
up to M × logM (n) reference counts. In a multi-threaded system these updates must be atomic,
adding additional overhead. This is also relatively cache inefficient, because the reference counted
object needs to be visited in order to update its count. These may also cause shadow writes to the
immutable data neighbouring the reference count, limiting the parallelism of concurrent reads.
However, reference counting also opens opportunities. Because it reclaims the memory deter-
ministically, we can put the freed nodes in a free list for future reuse. When performing batches
of immutable updates, not only does this avoid calling the relatively slow malloc, but also reuses
buffers that have been used recently, most probably paged in and in cache. Our benchmarks show
that combining single-threaded reference counting with free lists (even though they are small)
provide the best performance for all manipulation operations.
However, multi-threaded reference counting impacts immutable update performance, adding up
to a 1.5X –2X overhead over using libgc.28 For many use cases this might be tolerable, considering
the gains across all other operations. But more importantly, reference counting enables transient
updates on r-values (§ 6.3). When using move semantics in a disciplined manner across a whole
system, how much data is copied during updates depends on the level of aliasing. In other words:
this opens a continuum between immutability and mutability, where update performance is char-
acterized by the actual dynamic amount of persistence in the system, even when the programmer
uses only immutable interfaces.
We believe that for many realistic workloads this will provide a significant advantage to refer-
ence counting over automatic garbage collection. However, we found no way to design an unbiased
benchmark to test this hypothesis. Still, our implementation provides a good framework to vali-
date this assumption in concrete real world systems. Users can measure the impact of different
memory management configurations in their system and pick the one that fits best.
8 CONCLUSION
We described an implementation of ERRB-Trees in C++. By storing unboxed values and support-
ing transient operations, its performance is comparable to that of mutable arrays while providing
optional persistence, and logarithmic concatenation and slicing. Via generic programming and
policy-based design, different memory management strategies can be used such that the data struc-
tures can be made available to higher level languages through their C runtimes. Also, when using
reference counting and move semantics, all r-values become eligible for transient optimizations.
This effectively blurs the boundaries between immutable and mutable values and enables better
system wide performance without sacrificing a functional programming style.
28 Informal experiments on a more recent Skylake Intel processor show that the gap actually increases in modern machines,
up to a 3X –4X difference.
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
16:24 J. P. Bolívar
We showed that a systems programming language is suitable for implementing immutable data
structures. We hope that this helps making these data structures accessible to a wider audience, en-
abling functional architectures to face the challenge of building highly concurrent and interactive
systems.
9 FUTURE WORK
Associative containers. We would like to apply the methodology and techniques developed in
this work to other persistent data structures. Specially interesting are other wide-node trie based
random access containers, like HAMT [Bagwell 2001] and CHAMP [Steindorfer and Vinju 2015].
We anticipate that the relatively sparse nature of those data-structures (compared to RRB-Vectors)
makes some optimizations more costly (§ 6.3.4) and alternatives need to be developed. Also, the
radix-balanced structure could be used to implement persistent compile-time indexed hybrid struc-
tures like those in Boost.Fusion29 or Hana30
Diffing and patching. Because of structural sharing, comparing persistent data structures is al-
ready relatively efficient. It is interesting to compute the differences between two versions to from
a patch that can be used to reconstruct the more recent version from an older one. Applications
include: serializing a history of transactions to disk or the network, efficiently updating user inter-
faces, or implementing version control [Demaine et al. 2010].
Applications. We shall explore how RRB-Vectors can be used to design novel architectures, be-
yond the obvious ones (e.g document as a value). For example, games often use flat data models
with entities factored out horizontally into subsystems, with components stored in big per subsys-
tem sequences—data-oriented design [Fabian 2013]. RRB-vectors could be used to design persistent
high-performance in-memory data-bases for highly interactive systems.
29 http://boost.org/libs/fusion
30 http://boost.org/libs/hana
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
Persistence for the Masses: RRB-Vectors in a Systems Language 16:25
A A POSITION FRAMEWORK
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
16:26 J. P. Bolívar
B TRANSIENCE POLICIES
ACKNOWLEDGMENTS
We would like to thank the ICFP reviewers for their very valuable feedback. We are grateful to
María Carrasco Rodríguez, Francisco Jerez Plata, Emilio Jesús Gallego Arias, Ryan Brown, Joaquín
Valdivia, Javier Martínez Baena, Antonio Garrido Carrillo, and Raphael Dingé for dicussing these
ideas, reviewing early drafts, and their encouragement towards the publication of this work.
REFERENCES
Umut A. Acar, Arthur Charguéraud, and Mike Rainey. 2014. Theory and Practice of Chunked Sequences. Springer Berlin
Heidelberg, Berlin, Heidelberg, 25–36. https://doi.org/10.1007/978-3-662-44777-2_3
Andrei Alexandrescu. 2001. Modern C++ design: generic programming and design patterns applied. Addison-Wesley Longman
Publishing Co., Inc., Boston, MA, USA.
Matthew H. Austern. 2000. Segmented Iterators and Hierarchical Algorithms. In Selected Papers from the International
Seminar on Generic Programming. Springer-Verlag, London, UK, UK, 80–90. http://dl.acm.org/citation.cfm?id=647373.
724070
Phil Bagwell. 2000. Fast And Space Efficient Trie Searches. Technical Report.
Phil Bagwell. 2001. Ideal Hash Trees. Es Grands Champs 1195 (2001).
Phil Bagwell. 2002. Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrays. In In Implementation of Functional
Languages, 14th International Workshop. 34.
Philip Bagwell and Tiark Rompf. 2011. RRB-Trees: Efficient Immutable Vectors. Technical Report. EPFL.
Emery D. Berger, Benjamin G. Zorn, and Kathryn S. McKinley. 2001. Composing High-performance Memory Allocators.
SIGPLAN Not. 36, 5 (May 2001), 114–124. https://doi.org/10.1145/381694.378821
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
Persistence for the Masses: RRB-Vectors in a Systems Language 16:27
Hans-Juergen Boehm and Mark Weiser. 1988. Garbage Collection in an Uncooperative Environment. Softw., Pract. Exper.
18, 9 (1988), 807–820. https://doi.org/10.1002/spe.4380180902
Hans-J. Boehm, Russ Atkinson, and Michael Plass. 1995. Ropes: An Alternative to Strings. Softw. Pract. Exper. 25, 12 (Dec.
1995), 1315–1330. https://doi.org/10.1002/spe.4380251203
H-J Boehm, M Spertus, and C Nelson. 2008. N2670: Minimal support for garbage collection and reachability-based leak
detection (revised. (2008).
Sébastien Collette, John Iacono, and Stefan Langerman. 2012. Confluent Persistence Revisited. In Proceedings of the Twenty-
third Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’12). Society for Industrial and Applied Mathematics,
Philadelphia, PA, USA, 593–601. http://dl.acm.org/citation.cfm?id=2095116.2095166
Erik D. Demaine, Stefan Langerman, and Eric Price. 2010. Confluently Persistent Tries for Efficient Version Control. Algo-
rithmica 57, 3 (July 2010), 462–483. https://doi.org/10.1007/s00453-008-9274-z
Ulrich Drepper. 2008. What Every Programmer Should Know About Memory. Technical Report. Red Hat. http://people.
redhat.com/drepper/cpumemory.pdf
J R Driscoll, N Sarnak, D D Sleator, and R E Tarjan. 1986. Making Data Structures Persistent. In Proceedings of the Eighteenth
Annual ACM Symposium on Theory of Computing (STOC ’86). ACM, New York, NY, USA, 109–121. https://doi.org/10.
1145/12130.12142
Richard Fabian. 2013. Data-Oriented Design. (2013). http://www.dataorienteddesign.com/dodmain/dodmain.html
Amos Fiat and Haim Kaplan. 2001. Making Data Structures Confluently Persistent. In Proceedings of the Twelfth Annual
ACM-SIAM Symposium on Discrete Algorithms (SODA ’01). Society for Industrial and Applied Mathematics, Philadelphia,
PA, USA, 537–546. http://dl.acm.org/citation.cfm?id=365411.365528
Matthew Flatt and PLT. 2010. Reference: Racket. Technical Report PLT-TR-2010-1. PLT Design Inc. https://racket-lang.org/
tr1/.
Mark Galassi, Jim Blandy, Gary Houston, Tim Pierce, Neil Jerram, Martin Grabmüller, and Andy Wingo. 2002. Guile
Reference Manual. (2002). https://www.gnu.org/software/guile/manual/guile.html
Erich Gamma, Richard Helm, Ralph E. Johnson, and John Vlissides. 1995. Design Patterns. Elements of Reusable Object-
Oriented Software. Addison-Wesley.
Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically Rigorous Java Performance Evaluation. SIGPLAN
Not. 42, 10 (Oct. 2007), 57–76. https://doi.org/10.1145/1297105.1297033
Matthias Grimmer, Chris Seaton, Thomas Würthinger, and Hanspeter Mössenböck. 2015. Dynamically Composing Lan-
guages in a Modular Way: Supporting C Extensions for Dynamic Languages. In Proceedings of the 14th International Con-
ference on Modularity (MODULARITY 2015). ACM, New York, NY, USA, 1–13. https://doi.org/10.1145/2724525.2728790
Rich Hickey. 2008. The Clojure Programming Language. In Proceedings of the 2008 Symposium on Dynamic Languages (DLS
’08). ACM, New York, NY, USA. https://doi.org/10.1145/1408681.1408682
Howard E. Hinnant, David Abrahams, and Peter Dimov. 2004. A Proposal to Add an Rvalue Reference to the C++ Language.
Technical Report N1690=04-0130. ISO JTC1/SC22/WG21 – C++ working group.
Ralf Hinze and Ross Paterson. 2006. Finger Trees: A Simple General-purpose Data Structure. Journal of Functional Pro-
gramming 16, 2 (2006), 197–217.
Haim Kaplan. 2005. Persistent data structures. In In Handbook On Data Structures And applications, CRC Press 2001, Dinesh
Meht And Sarta Sahni (Editors) Boroujerdi, A., And Moret, B.M.E., "Persistency in Computational Geometry"; Proc. 7TH
Canadian Conf. Comp. Geometry, Quebeq. 241–246.
Jean Niklas L’orange. 2014. Improving RRB-Tree Performance through Transience. Master’s thesis. Norwegian University of
Science and Technology.
Nicholas D. Matsakis and Felix S. Klock, II. 2014. The Rust Language. Ada Lett. 34, 3 (Oct. 2014), 103–104. https://doi.org/
10.1145/2692956.2663188
C. Okasaki. 1999. Purely Functional Data Structures. Cambridge University Press. https://books.google.de/books?id=
SxPzSTcTalAC
Aleksandar Prokopec. 2014. Data Structures and Algorithms for Data-Parallel Computing in a Managed Runtime. Ph.D.
Dissertation. IC, Lausanne. https://doi.org/10.5075/epfl-thesis-6264
Jon Rafkind, Adam Wick, John Regehr, and Matthew Flatt. 2009. Precise Garbage Collection for C. In Proceedings of the
2009 International Symposium on Memory Management (ISMM ’09). ACM, New York, NY, USA, 39–48. https://doi.org/
10.1145/1542431.1542438
Michael J. Steindorfer and Jurgen J. Vinju. 2015. Optimizing Hash-array Mapped Tries for Fast and Lean Immutable JVM
Collections. SIGPLAN Not. 50, 10 (Oct. 2015), 783–800. https://doi.org/10.1145/2858965.2814312
Michael J. Steindorfer and Jurgen J. Vinju. 2016. Towards a Software Product Line of Trie-based Collections. In Proceedings
of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2016).
ACM, New York, NY, USA, 168–172. https://doi.org/10.1145/2993236.2993251
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.
16:28 J. P. Bolívar
Nicolas Stucki, Tiark Rompf, Vlad Ureche, and Phil Bagwell. 2015. RRB Vector: A Practical General Purpose Immutable
Sequence. SIGPLAN Not. 50, 9 (Aug. 2015), 342–354. https://doi.org/10.1145/2858949.2784739
D Walker. 2005. Substructural type systems. In In Advanced Topics in Types and Programming Languages. The MIT Press.
Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 16. Publication date: September 2017.