0% found this document useful (0 votes)

8 views

autobot-mlsys2020

Uploaded by

hectorjazz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

autobot-mlsys2020

Uploaded by

hectorjazz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Appears in the proceedings of the 2020 On-Device Intelligence Workshop, co-located with the MLSys Conference

KML IB : T OWARDS M ACHINE L EARNING F OR O PERATING S YSTEMS

Ibrahim Umit Akgun 1 Ali Selman Aydin 1 Erez Zadok 1

A BSTRACT
Despite the ever-changing software and hardware profiles of modern computing systems, many operating systems
(OS) components adhere to designs developed decades ago. Considering the variety of dynamic workloads that
modern operating systems are expected to manage, it is quite beneficial to develop adaptive systems that learn
from data patterns and OS events. However, developing such adaptive systems in kernel space involves the
bottom-up implementation of math and machine learning (ML) primitives that are readily available in user space
via widely-used ML libraries. However, user-level ML engines are often too costly (in terms of CPU and memory
footprint) to be used inside a tightly controlled, resource constrained OS. To this end, we started developing
KMLib, a lightweight yet efficient ML engine targeting kernel space components. We detail our proposed design
in this paper, demonstrated through a first prototype targeting the OS I/O scheduler. Our prototype’s memory
footprint is 804KB for the kernel module and 96KB for the library; experiments show we can reduce I/O latency
by 8% on our benchmark workload and testbed, which is significant for typically slow I/O devices.

1 I NTRODUCTION This is followed by work on data management systems that

are optimized for workloads and underlying system spec-
Rapid changes in hardware that are interacting heavily with ifications (Kraska et al., 2019). In computer architecture
operating systems raise questions about OS design. OS research, researchers realized that predicting memory ac-
development is a difficult and tedious task, and it is not able cess patterns can be formulated as an ML problem, and
to keep with these hardware changes or new algorithmic they developed cache-replacement models to improve the
techniques quickly. In addition, recent years have witnessed system performance (Hashemi et al., 2018; Shi et al., 2019).
major changes in workloads. Contrary to these changes, OS page-cache management is a similar problem as cache-
most of the OS components’ designs have changed little replacement in CPUs. In addition, operating systems use
over the years. hash tables in numerous places, which might be enhanced
One example of the divergence between hardware and soft- with learned structures (Kraska et al., 2018).
ware can be seen in storage technologies. Storage devices Although it is possible to utilize well-known ML libraries to
are getting faster and different every day. Keeping up with build ML approaches for data management systems, using
the changes to storage devices require either a complete re- ML in operating systems poses unique three challenges. (1)
design of some of the components in the storage stack or tun- Developing ML solutions working in kernel space requires
ing parameters and developing more workload-aware data extensive kernel programming skills. (2) Debugging and
structures and algorithms. In the past few years, we have wit- fine-tuning ML models, which is an essential component of
nessed such a paradigm shift in data management systems most ML development pipelines, could be quite challenging
and computer architectures. Both OS research and these for ML models working only in kernel space, because the
fields tackle similar tasks such as caching, indexing, and OS is naturally hard to debug and notoriously sensitive to
scheduling. For example, in the data management system bugs and performance overheads. (3) Certain QoS for oper-
research, researchers have developed learned structures to ating system requirements could require ML models to be
improve performance and adaptability(Kraska et al., 2018). deployed in kernel space to avoid the extra costs incurred
1
Department of Computer Science, Stony Brook for user-kernel switches. There are kernel tasks that can
University, USA. Correspondence to: Ibrahim Umit not tolerate the overhead of user-kernel switches. Because
Akgun <[email protected]>, Ali Selman these kernel tasks might be running under hard time lim-
Aydin <[email protected]>, Erez Zadok its, and adding extra overhead can cause timeouts. These
<[email protected]>.
challenges motivated us to design and develop an ML li-
Proceedings of the On-Device Intelligence Workshop, co-located brary targeted for adoption within the kernel, called KMLib.
with the MLSys Conference, Austin, Texas, USA, 2020. Copyright KMLib is an attempt to enable ML applications in a rela-
2020 by the author(s). tively unexplored yet challenging environment of the OS

1
KMLib

kernel. Researchers have proposed interesting ideas related et al., 2016; Paszke et al., 2019), we decided on a common
to ML for task scheduling (Negi & Kumar, 2005; Smith tensor-like representation for matrices and model parame-
et al., 1998), I/O scheduling (Hao et al., 2017), and storage ters. Functionality for manipulating matrices, such as matrix
parameter tuning (Cao et al., 2018). However, to the best of addition-multiplication and l2 norm has also been imple-
our knowledge, there is no previous work that attempts to mented as part of the library. Third, neural networks are rep-
develop an ML ecosystem for operating systems. resented as a collection of layers, each of which implement
forward() for forward propagation and backward()
KMLib aims to (i) enable easy to develop ML applications
for backward propagation. Whenever a new layer is to be
with low computational cost and memory footprint and (ii)
added to the library, forward() and backward() func-
make it easier to debug and fine-tune ML applications by
tions need to be implemented. In addition, our plan is to
providing primitives that behave identically in user space
use lock-free data structures when implementing the lay-
and in kernel space. We believe that a library like KMLib
ers to allow for parallel processing by breaking down the
could enable numerous ML based applications targeting
computation DAG when possible. Finally, neural networks
operating systems and help us to rethink how to design
implemented with this library will use an API similar to
adaptive and self-configured operating systems.
the individual layers, where forward() will facilitate for-
ward propagation of input through the computation DAG,
2 BACKGROUND AND R ELATED W ORK and backward() will apply backward propagation via
chain-rule, using backward() method in each layer for
While mainstream machine learning libraries like Tensor-
computing the derivatives of the corresponding layer. In our
Flow (Abadi et al., 2016) and PyTorch (Paszke et al., 2019)
design, the loss functions are treated like the other layers
has gained widespread use in research and production, there
in terms of implementation. Our library will implement
have also been several attempts to build machine learning
reverse-mode automatic differentiation to compute the gra-
libraries to address specific needs. Embedded Learning Li-
dients, which are then used to update the model weights
brary (ELL) (ELL) by Microsoft is one example, targeting
using gradient-based learning algorithms such as gradient
embedded devices. TensorFlow Lite(TensorFlow Lite) by
descent.
Google is a library for running machine learning applica-
tions on resource constrained devices. For using ML to Our initial goal is to provide users with the implemen-
improve operating systems, there has been several propos- tations of most widely-used linear layers, such as fully-
als (Zhang & Huang, 2019). connected and convolutional (LeCun et al., 1998) layers,
and widely-used non-linearities such as ReLU (Nair & Hin-
Researchers have investigated to tune file system param-
ton, 2010) and Sigmoid, in addition to sequential models
eters (Cao et al., 2018). Because this work performs the
like LSTMs (Hochreiter & Schmidhuber, 1997). We also
optimization in an offline manner, it is not designed to adapt
provide users with widely used losses such as cross entropy
to workload changes. Another work has attempted to im-
and mean square error. Users are able to extend the library
prove I/O schedulers by predicting whether the I/O request
with their own layers and loss functions by providing their
meets the deadline or not (Hao et al., 2017). But, the predic-
own implementations.
tions for I/O request deadlines were based on the result of a
linear regression model that is trained on synthetically gen-
erated data in an offline manner. These examples suggest Adapting to new workloads. The ever-changing work-
that having a machine learning library that works in kernel loads of modern computing systems means that machine
can help to build adaptive operating system components. learning models developed to exploit patterns in any work-
load must be adaptive. This could be achieved by constantly
3 M ACHINE L EARNING L IBRARY FOR training the model, which incurs extra computational costs
O PERATING S YSTEMS and memory footprint. Hence, there is a trade-off between
the power of adaptation and computational efficiency. For
3.1 Machine Learning Library Design the low-dimensional and less challenging machine learn-
Overview. There are several points and design choices ing problems where convergence could be achieved after
worth mentioning regarding our machine learning library a small number of steps, one could employ a simple feed-
that will power ML applications in kernel space. First, back mechanism to control the training schedule. The goal
the lack of access to standard math floating-point func- of this mechanism is to perform inference only when the
tions in the kernel means we have to implement nearly all performance is better than random guess by a pre-defined
math functions (including common functions such as pow threshold. More formally, for a classification task we per-
and log) ourselves. Second, following the design choice form inference only when the classification accuracy over
seen in numerous mainstream deep-learning libraries (Abadi the last k batches is at least pmargin higher than the most fre-
quent label in these k batches. k and pmargin are adjustable,

2
3.2
and higher stability.

Operating System Integration

in training and used more for inference.

kernel neon begin and kernel neon end.) We

v8 integration, floating-point enable/disable functions are
abled by calling kernel fpu end. (For KMLib ARM
point operations are not allowed in the Linux kernel. One
way to perform floating point operations in the kernel
tures and is also flexible to adapt custom data types. One
of the ways to reduce computation overhead for KMLib
cause timeouts and serious performance degradations. One
of our biggest concerns while designing KMLib. There
Low-precision training. Computation overhead is one
could result in models that spend the least amount of time
fective utilization of methods for both of these problems
from active learning (Settles, 2009), where the learning is
model performance. This could be approach using ideas
essary to avoid using samples that are not likely to improve
ideas from few-shot learning (Wang & Yao, 2019) when
training time and samples could be achieved by borrowing
the smallest number of samples for training. Reducing the
system that spends the least amount of time training, using
factors. Ideally, one would like to deploy a machine learning
putation costs and memory footprints incurred by training
an edge device is a multi-objective problem, where some
More specifically, learning the ever-changing workloads on
lenging machine learning problems in kernel space could
sonable performance is not likely to take significant amounts
with implications on memory footprint, computational cost

and inference makes it necessary to consider multiple other

is to enable the x86 architecture’s floating-point unit by

microsecond time and any extra latency for these tasks may

KMLib can work in both user and kernel spaces. But,

of these data types is float. As we mentioned above,
is by using low-precision training techniques (Choi et al.,
of the objectives are more obvious, i.e., computation time,

erations are finalized, use of floating points can be dis-

calling kernel fpu begin. Once floating point op-
ations in kernel space. It is well-known that floating-
there are some challenges in using floating-point oper-
KMLib can support different data types for tensor struc-
are operating system tasks that must be completed in sub-
performed on a promising subset of the labeled data. Ef-
applicable. The relatively high cost of training makes it nec-
mizing for these objectives while there are non-zero com-
memory footprint, and energy consumption. However, opti-
of computational power, high-dimensional and more chal-
tive in a low-dimensional problem where converging to rea-
While the simple mechanism described above could be effec-

2019; De Sa et al., 2018; Gupta et al., 2015; Sa et al., 2017).

require taking into account other aspects of the problem.

3
KMLib

User space

kernel space library.

Kernel space

mq-kmlib.ko
mq-kmlib.ko

sha1_base64="/G4s+qGxPkCvpkHCP9y15mBxbNk=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBg5REED0WvHisYD+kDWGz3bRLN5uwOxFK6I/w4kERr/4eb/4bt20O2vpg4PHeDDPzwlQKg6777ZTW1jc2t8rblZ3dvf2D6uFR2ySZZrzFEpnobkgNl0LxFgqUvJtqTuNQ8k44vp35nSeujUjUA05S7sd0qEQkGEUrdboBXjwGGFRrbt2dg6wSryA1KNAMql/9QcKymCtkkhrT89wU/ZxqFEzyaaWfGZ5SNqZD3rNU0ZgbP5+fOyVnVhmQKNG2FJK5+nsip7Exkzi0nTHFkVn2ZuJ/Xi/D6MbPhUoz5IotFkWZJJiQ2e9kIDRnKCeWUKaFvZWwEdWUoU2oYkPwll9eJe3LuufWvXu31rgq4ijDCZzCOXhwDQ24gya0gMEYnuEV3pzUeXHenY9Fa8kpZo7hD5zPH96+jzI=</latexit>
sha1_base64="f36H0rK+xs1ulc2r1EdsYTeAQbI=">AAAB7nicbVDLSgNBEOz1GeNr1aOXwSB4kLAriB4DXjxGMA9JlmV2MpsMmZ1dZnqFEPIRXjwookc/w2/wlr9x8jhoYkFDUdVNd1eUSWHQ88bOyura+sZmYau4vbO7t+8eHNZNmmvGayyVqW5G1HApFK+hQMmbmeY0iSRvRP2bid945NqIVN3jIONBQrtKxIJRtFKjGeL5Q4ihW/LK3hRkmfhzUqq4X+MPAKiG7ne7k7I84QqZpMa0fC/DYEg1Cib5qNjODc8o69Mub1mqaMJNMJyeOyKnVumQONW2FJKp+ntiSBNjBklkOxOKPbPoTcT/vFaO8XUwFCrLkSs2WxTnkmBKJr+TjtCcoRxYQpkW9lbCelRThjahog3BX3x5mdQvyr5X9u9sGpcwQwGO4QTOwIcrqMAtVKEGDPrwBC/w6mTOs/PmvM9aV5z5zBH8gfP5A0qbkbs=</latexit><latexit
<latexit
sha1_base64="/G4s+qGxPkCvpkHCP9y15mBxbNk=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBg5REED0WvHisYD+kDWGz3bRLN5uwOxFK6I/w4kERr/4eb/4bt20O2vpg4PHeDDPzwlQKg6777ZTW1jc2t8rblZ3dvf2D6uFR2ySZZrzFEpnobkgNl0LxFgqUvJtqTuNQ8k44vp35nSeujUjUA05S7sd0qEQkGEUrdboBXjwGGFRrbt2dg6wSryA1KNAMql/9QcKymCtkkhrT89wU/ZxqFEzyaaWfGZ5SNqZD3rNU0ZgbP5+fOyVnVhmQKNG2FJK5+nsip7Exkzi0nTHFkVn2ZuJ/Xi/D6MbPhUoz5IotFkWZJJiQ2e9kIDRnKCeWUKaFvZWwEdWUoU2oYkPwll9eJe3LuufWvXu31rgq4ijDCZzCOXhwDQ24gya0gMEYnuEV3pzUeXHenY9Fa8kpZo7hD5zPH96+jzI=</latexit>
sha1_base64="f36H0rK+xs1ulc2r1EdsYTeAQbI=">AAAB7nicbVDLSgNBEOz1GeNr1aOXwSB4kLAriB4DXjxGMA9JlmV2MpsMmZ1dZnqFEPIRXjwookc/w2/wlr9x8jhoYkFDUdVNd1eUSWHQ88bOyura+sZmYau4vbO7t+8eHNZNmmvGayyVqW5G1HApFK+hQMmbmeY0iSRvRP2bid945NqIVN3jIONBQrtKxIJRtFKjGeL5Q4ihW/LK3hRkmfhzUqq4X+MPAKiG7ne7k7I84QqZpMa0fC/DYEg1Cib5qNjODc8o69Mub1mqaMJNMJyeOyKnVumQONW2FJKp+ntiSBNjBklkOxOKPbPoTcT/vFaO8XUwFCrLkSs2WxTnkmBKJr+TjtCcoRxYQpkW9lbCelRThjahog3BX3x5mdQvyr5X9u9sGpcwQwGO4QTOwIcrqMAtVKEGDPrwBC/w6mTOs/PmvM9aV5z5zBH8gfP5A0qbkbs=</latexit><latexit
<latexit
computation overheads.

inference

kmlib

X t , Yt

inference
(a) Kernel library
kmlib.ko
OS-ML Api

(b) User-Kernel shared library

sha1_base64="9VkaJeGxf54kbOUrzrCppqBR+2k=">AAACAHicbVBNS8NAEJ3Ur1q/oh48eFksgqeSCKLHghcPHirYD2hK2Ww27dLNJuxulBJy8a948aCIV3+GN/+NmzYHbX0w8Hhvhpl5fsKZ0o7zbVVWVtfWN6qbta3tnd09e/+go+JUEtomMY9lz8eKciZoWzPNaS+RFEc+p11/cl343QcqFYvFvZ4mdBDhkWAhI1gbaWgfeaHEJAu8COsxwTy7zfMseMyHdt1pODOgZeKWpA4lWkP7ywtikkZUaMKxUn3XSfQgw1Izwmle81JFE0wmeET7hgocUTXIZg/k6NQoAQpjaUpoNFN/T2Q4Umoa+aazuFMteoX4n9dPdXg1yJhIUk0FmS8KU450jIo0UMAkJZpPDcFEMnMrImNsEtEms5oJwV18eZl0zhuu03DvnHrzooyjCsdwAmfgwiU04QZa0AYCOTzDK7xZT9aL9W59zFsrVjlzCH9gff4A0SeXIQ==</latexit>
sha1_base64="NeiItkj9C53fOPXxgdZxTFMa9OA=">AAACAHicbVBNS8NAEJ1Uq7V+VT148LJYBC+WRBA9Frx48FDBfkATymazaZduNmF3o5SQi3/FiwdFvPozvPlv3LQ9aOuDgcd7M8zM8xPOlLbtb6u0slpeW69sVDe3tnd2a3v7HRWnktA2iXksez5WlDNB25ppTnuJpDjyOe364+vC7z5QqVgs7vUkoV6Eh4KFjGBtpEHt0A0lJlngRliPCObZbZ5nwWM+qNXthj0FWibOnNSbZ2UEBq1B7csNYpJGVGjCsVJ9x060l2GpGeE0r7qpogkmYzykfUMFjqjysukDOToxSoDCWJoSGk3V3xMZjpSaRL7pLO5Ui14h/uf1Ux1eeRkTSaqpILNFYcqRjlGRBgqYpETziSGYSGZuRWSETSLaZFY1ITiLLy+TznnDsRvOnUnjAmaowBEcwyk4cAlNuIEWtIFADs/wCm/Wk/VivVsfs9aSNZ85gD+wPn8AlxeXsw==</latexit><latexit
<latexit
sha1_base64="9VkaJeGxf54kbOUrzrCppqBR+2k=">AAACAHicbVBNS8NAEJ3Ur1q/oh48eFksgqeSCKLHghcPHirYD2hK2Ww27dLNJuxulBJy8a948aCIV3+GN/+NmzYHbX0w8Hhvhpl5fsKZ0o7zbVVWVtfWN6qbta3tnd09e/+go+JUEtomMY9lz8eKciZoWzPNaS+RFEc+p11/cl343QcqFYvFvZ4mdBDhkWAhI1gbaWgfeaHEJAu8COsxwTy7zfMseMyHdt1pODOgZeKWpA4lWkP7ywtikkZUaMKxUn3XSfQgw1Izwmle81JFE0wmeET7hgocUTXIZg/k6NQoAQpjaUpoNFN/T2Q4Umoa+aazuFMteoX4n9dPdXg1yJhIUk0FmS8KU450jIo0UMAkJZpPDcFEMnMrImNsEtEms5oJwV18eZl0zhuu03DvnHrzooyjCsdwAmfgwiU04QZa0AYCOTzDK7xZT9aL9W59zFsrVjlzCH9gff4A0SeXIQ==</latexit>
sha1_base64="NeiItkj9C53fOPXxgdZxTFMa9OA=">AAACAHicbVBNS8NAEJ1Uq7V+VT148LJYBC+WRBA9Frx48FDBfkATymazaZduNmF3o5SQi3/FiwdFvPozvPlv3LQ9aOuDgcd7M8zM8xPOlLbtb6u0slpeW69sVDe3tnd2a3v7HRWnktA2iXksez5WlDNB25ppTnuJpDjyOe364+vC7z5QqVgs7vUkoV6Eh4KFjGBtpEHt0A0lJlngRliPCObZbZ5nwWM+qNXthj0FWibOnNSbZ2UEBq1B7csNYpJGVGjCsVJ9x060l2GpGeE0r7qpogkmYzykfUMFjqjysukDOToxSoDCWJoSGk3V3xMZjpSaRL7pLO5Ui14h/uf1Ux1eeRkTSaqpILNFYcqRjlGRBgqYpETziSGYSGZuRWSETSLaZFY1ITiLLy+TznnDsRvOnUnjAmaowBEcwyk4cAlNuIEWtIFADs/wCm/Wk/VivVsfs9aSNZ85gD+wPn8AlxeXsw==</latexit><latexit
<latexit
OS-ML Api
on a context switch, and adds additional overheads.

dw
dL
User space
Kernel space

is to save the input data and the predictions for training.

explain more how KMLib handles capping memory and
training not only helps to reduce computational overhead
numbers (Wang et al., 2018) with KMLib. Low-precision

The blocking mode helps the user to process every single

duce interference. The only interference that KMLib adds
while the underlying operating system is running. KMLib
tem. KMLib is capable of training and inference operations
Computation and memory capping. KMLib is designed
implementation for KMLib library and kmlib.ko refers to KMLib
We are working to support 16-bit and 8-bit wide fixed-point
unit is enabled, the kernel must save floating point registers
ing context-switched to other tasks. When the floating-point
code block, because the more time KMLib spends in a
tried to minimize the size of the floating-point enabled

Users can configure the size of the circular buffers. Circular

but also lowers memory consumption, which is another
critical point when we started designing KMLib. We now

Figure 1. KMLib architectures: mq-kmlib.ko is a reference ML

offloads the training computation to library threads to re-

to create as little as possible interference in the running sys-
floating-point–enabled regions, the higher the chance of be-

buffers have two running modes: blocking and dropping.

We used lock-free circular buffers to store training data.
KMLib

input piece data, but if the frequency of computation re- ordinalized operation type as features. The predicted issue
quests is high, this blocking mode might add extra overhead time is then thresholded to predict whether the I/O request
by blocking additional inputs from being processed. The should be early-rejected or not. We hypothesize that this
dropping mode overruns unprocessed input data: it does not should reduce the overall latency.
add extra overhead, but KMLib then loses data, which may
We have conducted the experiments on QEMU with I/O
hurt training quality. Using these features, the user can cap
throttling running on Intel(R) Core(TM) i7-7500U and 8GB
memory overhead based on their ML application needs.
RAM and Intel SSD(256GB). We use our modified version
The computational overhead of training varies based on of Linux Kernel v4.19.51+ for all experiments.
the complexity of the learning model. We designed of-
For the workload generation, we ran the FIO (FIO) micro-
floading training computation to KMLib library threads, but
benchmark which is configured to perform random read and
there are other challenges of partitioning computation DAG.
write operations with 4 threads on a 1GB dataset. Each
Even though KMLib uses lock-free data structures to re-
experiment is executed on a fresh QEMU instance. We
duce multi-threaded communication and synchronization
cloned the mq-deadline I/O scheduler as mq-kmlib
overhead, there might be dependencies in the computational
and integrated it with KMLib. We made three key
DAG, which might cause latencies. That is why we also
changes in the mq-kmlib I/O scheduler compared to
allow the user to choose how many threads can be used for
mq-deadline: (i) In the dd init queue function, we
(i) training and (ii) inference. All these features that related
inserted initialization code fragments to set the learning
to offloading training/inference computation can be disabled
rate, batch size, momentum, and number of features to
and can be done in the original thread context as well.
learn. Initial weights are also set randomly here. (ii) In
the dd dispatch request function, we call the func-
User space vs. kernel space. The first question that tions that collect Xt and Yt and perform the training steps.
comes to mind is why we started implementing a machine (iii) In the dd insert request function, we invoke an
learning library from scratch for optimizing operating sys- inference function; and based on the prediction we decide
tem tasks, rather than using a well-known user space library whether to early-reject the I/O request or not.
with data collected from the operating system. It is possible We observed that the thresholded regression output could
to collect data from the operating system and feed into user predict with an accuracy of 74.62% whether the I/O requests
space ML implementations. But, there are challenges with miss the deadline or not: this reduced the overall I/O latency
that approach. For example, offloading training and infer- by 8%, a promising result given that I/O is so much slower
ence should be running sub-microsecond because of the than memory or CPU (and hence I/O should be the first to
nature of operating system tasks. KMLib can be deployed optimize). Our test involved a single synthetic workload
in two different modes: (i) kernel mode (Figure 1(a)) and that does not cover a large number of use cases, and our
(ii) kernel-user memory mapped shared mode (Figure 1(b)) performance may not generalize to other workloads. Further,
. In kernel mode, both training and inference happens in the the emulated environment provided by QEMU may not
kernel space. In kernel-user memory mapped shared mode, represent a realistic use case, due to artificial throttling in
KMLib collects data from the kernel space and trains using QEMU. This is why our next step would be to investigate
user-space threads. For the inference, KMLib still runs the if these results generalize to other workloads under more
operations in kernel space to reduce the latency. We are realistic conditions (e.g., physical machines). We are also
using user-kernel shared lock-free circular buffers (Desnoy- planning to apply machine learning models to other storage
ers & Dagenais, 2012) for collecting training data. But, stack components like the page cache.
KMLib threads can drain training request only when it gets
scheduled because KMLib threads are working in a polling We wrote nearly 3,000 lines of C/C++ code (LoC). Because
manner. We continue improving the user-space approach the current set of machine learning tools we have imple-
because we believe that it improves developer productivity, mented is small, the memory footprint size of the KMLib
and developing, debugging, and testing learning models is user-space library is just 96KB, and the size of the KMLib
much easier in user space than developing in the kernel. kernel module is only 804KB. However, we expect these
numbers to increase as additional functionality is imple-
mented.
4 E VALUATION
We developed a sample application of KMLib to fine-tune 5 C ONCLUSION
mq-deadline I/O scheduler. To predict whether the I/O
request will meet the deadline or not, we train a linear re- Adapting operating system components to running work-
gression model. The regression model predicts issue time loads and hardware has been done by tuning parameters or
for a given I/O request using normalized block number and changing the critical data structure properties empirically.

4
KMLib

We have proposed that lightweight machine learning ap- Hashemi, M., Swersky, K., Smith, J. A., Ayers, G., Litz, H.,
proaches may help to solve these problems. Our preliminary Chang, J., Kozyrakis, C., and Ranganathan, P. Learning
evaluation show some promising results. Our plan is to memory access patterns. In Proceedings of the 35th
expand on the work, apply it to other OS components, and International Conference on Machine Learning, ICML
evaluate and optimize the ML library for a wide range of 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15,
workloads. 2018, pp. 1924–1933, 2018.

Hochreiter, S. and Schmidhuber, J. Long short-term memory.

R EFERENCES Neural computation, 9(8):1735–1780, 1997.
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean,
Kraska, T., Beutel, A., Chi, E. H., Dean, J., and Polyzotis,
J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kud-
N. The case for learned index structures. In Proceedings
lur, M., Levenberg, J., Monga, R., Moore, S., Murray,
of the 2018 International Conference on Management of
D. G., Steiner, B., Tucker, P. A., Vasudevan, V., Warden,
Data, pp. 489–504. ACM, 2018.
P., Wicke, M., Yu, Y., and Zheng, X. Tensorflow: A
system for large-scale machine learning. In 12th USENIX Kraska, T., Alizadeh, M., Beutel, A., Chi, E. H., Kristo, A.,
Symposium on Operating Systems Design and Implemen- Leclerc, G., Madden, S., Mao, H., and Nathan, V. Sagedb:
tation, OSDI 2016, Savannah, GA, USA, November 2-4, A learned database system. In CIDR 2019, 9th Biennial
2016, pp. 265–283, 2016. Conference on Innovative Data Systems Research, Asilo-
mar, CA, USA, January 13-16, 2019, Online Proceedings,
Cao, Z., Tarasov, V., Tiwari, S., and Zadok, E. Towards
2019.
better understanding of black-box auto-tuning: A compar-
ative analysis for storage systems. In 2018 USENIX An- LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.
nual Technical Conference, USENIX ATC 2018, Boston, Gradient-based learning applied to document recognition.
MA, USA, July 11-13, 2018, pp. 893–907, 2018. Proceedings of the IEEE, 86(11):2278–2324, 1998.
Choi, J., Venkataramani, S., Srinivasan, V., Gopalakrishnan, Nair, V. and Hinton, G. E. Rectified linear units improve
K., Wang, Z., and Chuang, P. Accurate and efficient 2-bit restricted boltzmann machines. In Proceedings of the 27th
quantized neural networks. In Proceedings of the 2nd International Conference on Machine Learning (ICML-
SysML Conference, 2019. 10), June 21-24, 2010, Haifa, Israel, pp. 807–814, 2010.
De Sa, C., Leszczynski, M., Zhang, J., Marzoev, A., Negi, A. and Kumar, P. K. Applying machine learning tech-
Aberger, C. R., Olukotun, K., and Ré, C. High-accuracy niques to improve linux process scheduling. In TENCON
low-precision training. arXiv preprint arXiv:1803.03383, 2005-2005 IEEE Region 10 Conference, pp. 1–6. IEEE,
2018. 2005.
Desnoyers, M. and Dagenais, M. R. Lockless multi-core Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
high-throughput buffering scheme for kernel tracing. Op- Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga,
erating Systems Review, 46(3):65–81, 2012. L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Rai-
ELL. Embedded Learning Library (ELL), January 2020. son, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang,
https://microsoft.github.io/ELL/. L., Bai, J., and Chintala, S. Pytorch: An imperative style,
high-performance deep learning library. In Advances
FIO. Flexible I/O Tester, January 2020. https://fio. in Neural Information Processing Systems 32: Annual
readthedocs.io/en/latest/. Conference on Neural Information Processing Systems
2019, NeurIPS 2019, 8-14 December 2019, Vancouver,
Gupta, S., Agrawal, A., Gopalakrishnan, K., and Narayanan, BC, Canada, pp. 8024–8035, 2019.
P. Deep learning with limited numerical precision. In
Proceedings of the 32nd International Conference on Sa, C. D., Feldman, M., Ré, C., and Olukotun, K. Un-
Machine Learning, ICML 2015, Lille, France, 6-11 July derstanding and optimizing asynchronous low-precision
2015, pp. 1737–1746, 2015. stochastic gradient descent. In Proceedings of the 44th
Annual International Symposium on Computer Architec-
Hao, M., Li, H., Tong, M. H., Pakha, C., Suminto, R. O., ture, ISCA 2017, Toronto, ON, Canada, June 24-28, 2017,
Stuardo, C. A., Chien, A. A., and Gunawi, H. S. Mittos: pp. 561–574, 2017.
Supporting millisecond tail tolerance with fast rejecting
slo-aware OS interface. In Proceedings of the 26th Sympo- Settles, B. Active learning literature survey. Technical
sium on Operating Systems Principles, Shanghai, China, report, University of Wisconsin-Madison Department of
October 28-31, 2017, pp. 168–183, 2017. Computer Sciences, 2009.

5
KMLib

Shi, Z., Huang, X., Jain, A., and Lin, C. Applying deep
learning to the cache replacement problem. In Proceed-
ings of the 52nd Annual IEEE/ACM International Sym-
posium on Microarchitecture, MICRO 2019, Columbus,
OH, USA, October 12-16, 2019, pp. 413–425, 2019.
Smith, W., Foster, I. T., and Taylor, V. E. Predicting
application run times using historical information. In
Job Scheduling Strategies for Parallel Processing, IPP-
S/SPDP’98 Workshop, Orlando, Florida, USA, March 30,
1998, Proceedings, pp. 122–142, 1998.
TensorFlow Lite. TensorFlow Lite, January 2020. https:
//www.tensorflow.org/lite.

Wang, N., Choi, J., Brand, D., Chen, C., and Gopalakrish-
nan, K. Training deep neural networks with 8-bit float-
ing point numbers. In Advances in Neural Information
Processing Systems 31: Annual Conference on Neural
Information Processing Systems 2018, NeurIPS 2018, 3-
8 December 2018, Montréal, Canada, pp. 7686–7695,
2018.
Wang, Y. and Yao, Q. Few-shot learning: A survey. arXiv
preprint arXiv:1904.05046, 2019.
Zhang, Y. and Huang, Y. ”Learned”: Operating systems.
SIGOPS Oper. Syst. Rev., 53(1):40–45, July 2019.

Financial Technology Salary Guide 2024
No ratings yet
Financial Technology Salary Guide 2024
26 pages
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Practical C++ Backend Programming
From Everand
Practical C++ Backend Programming
Justin Barbara
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
Mastering the Craft of C Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Craft of C Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Model-Driven Online Capacity Management for Component-Based Software Systems
From Everand
Model-Driven Online Capacity Management for Component-Based Software Systems
André van Hoorn
No ratings yet
Concise Oracle Database For People Who Has No Time
From Everand
Concise Oracle Database For People Who Has No Time
Billy Aung Myint
No ratings yet
ML Ops on Azure: From Models to Production
From Everand
ML Ops on Azure: From Models to Production
Kameron Hussain
No ratings yet
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Config File Types
From Everand
Config File Types
Frank Wellington
No ratings yet
Azure Data Demystified: From SQL to Synapse
From Everand
Azure Data Demystified: From SQL to Synapse
Kameron Hussain
No ratings yet
Data Memory Organization and Optimizations in Application-Specific Systems
No ratings yet
Data Memory Organization and Optimizations in Application-Specific Systems
13 pages
Elements of Android Room
From Everand
Elements of Android Room
Mark Murphy
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Oracle Quick Guides: Part 1 - Oracle Basics: Database and Tools
From Everand
Oracle Quick Guides: Part 1 - Oracle Basics: Database and Tools
Malcolm Coxall
No ratings yet
Cloud Development
From Everand
Cloud Development
Mei Gates
No ratings yet
C++ Automation Basics: A Practical Guide with Examples
From Everand
C++ Automation Basics: A Practical Guide with Examples
William E. Clark
No ratings yet
macOS Interview Questions and Answers Book
From Everand
macOS Interview Questions and Answers Book
Manish Soni
No ratings yet
Essays on Infrastructure-as-code
From Everand
Essays on Infrastructure-as-code
Ravi Rajamani
No ratings yet
zhang_2019_IA
No ratings yet
zhang_2019_IA
6 pages
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Edge Cloud Operations: A Systems Approach
From Everand
Edge Cloud Operations: A Systems Approach
Larry L Peterson
No ratings yet
C# Fundamentals Made Simple: A Practical Guide with Examples
From Everand
C# Fundamentals Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
HPC Clusters Demystified
From Everand
HPC Clusters Demystified
Alisa Turing
No ratings yet
Os Assign 1 Report
No ratings yet
Os Assign 1 Report
2 pages
C++ Mastery: Advanced Techniques and Strategies
From Everand
C++ Mastery: Advanced Techniques and Strategies
Adam Jones
No ratings yet
Mastering Kubernetes
From Everand
Mastering Kubernetes
Manish Soni
No ratings yet
Mastering Computer Programming: A Comprehensive Guide
From Everand
Mastering Computer Programming: A Comprehensive Guide
Kondwani Hara
No ratings yet
Mastering the Art of x86 Assembly Programming: Unlocking the Secrets of Expert-Level Skills
From Everand
Mastering the Art of x86 Assembly Programming: Unlocking the Secrets of Expert-Level Skills
Steve Jones
No ratings yet
Operating System Interview Questions and Answers
From Everand
Operating System Interview Questions and Answers
Manish Soni
No ratings yet
Building Scalable Systems with C: Optimizing Performance and Portability
From Everand
Building Scalable Systems with C: Optimizing Performance and Portability
Larry Jones
No ratings yet
Learning Ansible
From Everand
Learning Ansible
Wayne Taylor
No ratings yet
C++ Essentials
From Everand
C++ Essentials
Zoe Codewell
No ratings yet
C++ Data Structures Explained: A Practical Guide with Examples
From Everand
C++ Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
From Everand
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
Jordan Lioy
No ratings yet
Azure Architect's Playbook: Design for Scale
From Everand
Azure Architect's Playbook: Design for Scale
Kameron Hussain
No ratings yet
TOML Config Basics
From Everand
TOML Config Basics
Frank Wellington
No ratings yet
NVMe Performance Hacks
From Everand
NVMe Performance Hacks
Mei Gates
No ratings yet
Architecture Conscious Data Mining: Srinivasan Parthasarathy Data Mining Research Lab Ohio State University
No ratings yet
Architecture Conscious Data Mining: Srinivasan Parthasarathy Data Mining Research Lab Ohio State University
16 pages
System Design Basics
From Everand
System Design Basics
Kai Turing
No ratings yet
The Definitive JavaScript Handbook: From Fundamentals to Cutting‑Edge Best Practices
From Everand
The Definitive JavaScript Handbook: From Fundamentals to Cutting‑Edge Best Practices
Aarav Joshi
No ratings yet
A Parallel Evolutionary Algorithm To Optimize Dynamic Memory Managers in Embedded Systems
No ratings yet
A Parallel Evolutionary Algorithm To Optimize Dynamic Memory Managers in Embedded Systems
22 pages
Mastering the Art of C++ STL: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering the Art of C++ STL: Unlock the Secrets of Expert-Level Skills
Larry Jones
No ratings yet
OS and AI Development Roadmap_
100% (1)
OS and AI Development Roadmap_
26 pages
Administering ArcGIS for Server
From Everand
Administering ArcGIS for Server
Hussein Nasser
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Assignment 1 Os
100% (1)
Assignment 1 Os
16 pages
SQLite Database Programming for Xamarin: Cross-platform C# database development for iOS and Android using SQLite.XM
From Everand
SQLite Database Programming for Xamarin: Cross-platform C# database development for iOS and Android using SQLite.XM
Anthony Serpico
No ratings yet
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
Operating System Text Book
From Everand
Operating System Text Book
Manish Soni
No ratings yet
JavaScript File Handling from Scratch: A Practical Guide with Examples
From Everand
JavaScript File Handling from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Mastering MATLAB: A Comprehensive Journey Through Coding and Analysis
From Everand
Mastering MATLAB: A Comprehensive Journey Through Coding and Analysis
Kameron Hussain
No ratings yet
Mastering Cloud Computing With Best Practices
From Everand
Mastering Cloud Computing With Best Practices
Manish Soni
No ratings yet
Shell Scripting Step by Step: A Practical Guide with Examples
From Everand
Shell Scripting Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
C++ Algorithms for Beginners: A Practical Guide with Examples
From Everand
C++ Algorithms for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
Data Structures Guide
From Everand
Data Structures Guide
Alisa Turing
No ratings yet
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Core Objective-C in 24 Hours
From Everand
Core Objective-C in 24 Hours
Keith Lee
5/5 (1)
Practical C++ Backend Programming: Crafting Databases, APIs, and Web Servers for High-Performance Backend
From Everand
Practical C++ Backend Programming: Crafting Databases, APIs, and Web Servers for High-Performance Backend
Justin Barbara
No ratings yet
auto-pilot
No ratings yet
auto-pilot
72 pages
autofs
No ratings yet
autofs
5 pages
Adventure CVG Nov81
No ratings yet
Adventure CVG Nov81
2 pages
atc18auto-tuning
No ratings yet
atc18auto-tuning
15 pages
Guitar Music Theory - Norm Vincent - Lydian-Dominant Theory - OK!!!
No ratings yet
Guitar Music Theory - Norm Vincent - Lydian-Dominant Theory - OK!!!
22 pages
Stefano - Da - Thesis For Library
No ratings yet
Stefano - Da - Thesis For Library
233 pages
0 The UNIX Philosophy
No ratings yet
0 The UNIX Philosophy
14 pages
Zelenka Jan Dismas Kyrkomusik Wolfgang Horn
No ratings yet
Zelenka Jan Dismas Kyrkomusik Wolfgang Horn
13 pages
Deh Vieni Alla Finestra by Wolfgang Amadeus Mozart
No ratings yet
Deh Vieni Alla Finestra by Wolfgang Amadeus Mozart
4 pages
Zelenka Jan Dismas Massor Thomas Kohlhase
No ratings yet
Zelenka Jan Dismas Massor Thomas Kohlhase
10 pages
Applying A Constructivist and Collaborat PDF
No ratings yet
Applying A Constructivist and Collaborat PDF
25 pages
Walras 96
No ratings yet
Walras 96
516 pages
An Application of Machine Learning For A Smart Grid Resource Allocation Problem
No ratings yet
An Application of Machine Learning For A Smart Grid Resource Allocation Problem
6 pages
A Visual Simulation Framework For Simult PDF
No ratings yet
A Visual Simulation Framework For Simult PDF
7 pages
Bach
No ratings yet
Bach
1 page
Dynamic Register Renaming Through Virtua PDF
No ratings yet
Dynamic Register Renaming Through Virtua PDF
20 pages
A Portable Runtime Interface For Multi-L PDF
No ratings yet
A Portable Runtime Interface For Multi-L PDF
129 pages
A Survey of Mobile Transactions
No ratings yet
A Survey of Mobile Transactions
53 pages
Beyond Dataflow: Borut Robi C, Jurij Silc and Theo Ungerer
No ratings yet
Beyond Dataflow: Borut Robi C, Jurij Silc and Theo Ungerer
13 pages
Direct Communication and Synchronization
No ratings yet
Direct Communication and Synchronization
170 pages
Architecture-Based Performance Analysis Applied To A Telecommunication System
No ratings yet
Architecture-Based Performance Analysis Applied To A Telecommunication System
32 pages
Mutable Checkpoint-Restart Automating Li PDF
No ratings yet
Mutable Checkpoint-Restart Automating Li PDF
12 pages
Cat7 en
No ratings yet
Cat7 en
416 pages
Battalion Manual
No ratings yet
Battalion Manual
8 pages
Effect of Online Learning
No ratings yet
Effect of Online Learning
13 pages
Top 25 Questions and Answers On Hypertext Transfer Protocol
100% (1)
Top 25 Questions and Answers On Hypertext Transfer Protocol
12 pages
Fps & Fapa 7th August
No ratings yet
Fps & Fapa 7th August
25 pages
2p en Uds3v3
No ratings yet
2p en Uds3v3
2 pages
Mathematical Foundations of Computer Science Question Paper
No ratings yet
Mathematical Foundations of Computer Science Question Paper
1 page
Kick-Off Presentation Slides
No ratings yet
Kick-Off Presentation Slides
28 pages
EM2210 Manual en
No ratings yet
EM2210 Manual en
18 pages
MBA 593_F
No ratings yet
MBA 593_F
11 pages
Welch Allyn Diagnostic Sets
No ratings yet
Welch Allyn Diagnostic Sets
2 pages
Video Editing Guidelines PDF
No ratings yet
Video Editing Guidelines PDF
2 pages
Secure Optical Transport With The 1830 Photonic Service Switch
No ratings yet
Secure Optical Transport With The 1830 Photonic Service Switch
17 pages
Abraham Mbowaka 2023
No ratings yet
Abraham Mbowaka 2023
477 pages
Shear Wall Example
No ratings yet
Shear Wall Example
15 pages
BUSN5101 ExamRevisionQuestionsA
No ratings yet
BUSN5101 ExamRevisionQuestionsA
6 pages
Understanding Computer Networks
No ratings yet
Understanding Computer Networks
14 pages
Surveying End exam_model QP_Set1
No ratings yet
Surveying End exam_model QP_Set1
2 pages
(332-345) IE - Elective
No ratings yet
(332-345) IE - Elective
14 pages
Pharmacy Management System Project Report 1
No ratings yet
Pharmacy Management System Project Report 1
63 pages
Tapis Pump Data Sheets Revision 2
No ratings yet
Tapis Pump Data Sheets Revision 2
6 pages
OBI Query For Report Names and Tables
No ratings yet
OBI Query For Report Names and Tables
7 pages
4.1.2.A CandyStatistics 2021 - Covid
No ratings yet
4.1.2.A CandyStatistics 2021 - Covid
4 pages
Ligodlb 5-20N-Th: Outdoor Wireless Device
No ratings yet
Ligodlb 5-20N-Th: Outdoor Wireless Device
6 pages
Ch-5 MCQ Introduction to M-Commerce
No ratings yet
Ch-5 MCQ Introduction to M-Commerce
6 pages
Clearance Sale Volvo - 18112021
No ratings yet
Clearance Sale Volvo - 18112021
1,164 pages
CSIT124 - Data Structure Using C PDF
No ratings yet
CSIT124 - Data Structure Using C PDF
212 pages
Airsys P SC Medicool E1511V01.5
No ratings yet
Airsys P SC Medicool E1511V01.5
12 pages
CS Question Bank 2
No ratings yet
CS Question Bank 2
4 pages

autobot-mlsys2020

Uploaded by

autobot-mlsys2020

Uploaded by

Appears in the proceedings of the 2020 On-Device Intelligence Workshop, co-located with the MLSys Conference

KML IB : T OWARDS M ACHINE L EARNING F OR O PERATING S YSTEMS

Ibrahim Umit Akgun 1 Ali Selman Aydin 1 Erez Zadok 1

1 I NTRODUCTION This is followed by work on data management systems that

Operating System Integration

kernel neon begin and kernel neon end.) We

and inference makes it necessary to consider multiple other

is to enable the x86 architecture’s floating-point unit by

KMLib can work in both user and kernel spaces. But,

erations are finalized, use of floating points can be dis-

2019; De Sa et al., 2018; Gupta et al., 2015; Sa et al., 2017).

kernel space library.

(b) User-Kernel shared library

is to save the input data and the predictions for training.

The blocking mode helps the user to process every single

Users can configure the size of the circular buffers. Circular

Figure 1. KMLib architectures: mq-kmlib.ko is a reference ML

offloads the training computation to library threads to re-

buffers have two running modes: blocking and dropping.

Hochreiter, S. and Schmidhuber, J. Long short-term memory.

You might also like