PS03 3
PS03 3
Abstract—This paper introduces the concept of Trustless Au- and constraints. (2) The Data Broker is responsible for data
toML, and proposes a framework that combines AutoML tech- sanitization, annotated properly, and can sign-off the quality
niques with blockchain to fully decentralize the design and of the data. (3) The Architect is responsible for the definition
training process of machine learning models. The goal is to
introduce full transparency and trust in the model design pipeline of the initial Dockerfile that will be used to perform AutoML
to establish a root-of-trust. functionality such as architecture mutations/changes, training,
Index Terms—decentralized, blockchain, artificial intelligence, evaluation, testing tasks, etc. The architect is also responsible
machine learning, AutoML for creating the training loop and validation chaincodes. These
chaincodes are equivalent to what a f it/evaluate API would
I. I NTRODUCTION do within a runtime (e.g., PyTorch). (4) Each peer will run
Machine Learning (ML) models have traditionally been black our Trustless AutoML Runtime parallel to a blockchain peer
boxes, designed, trained, and vetted by subject-matter experts; instance. During each epoch, the AutoML engine will rely
this limits adoption as the learning curve is quite steep. Thus, on its AutoML chaincode to start the AutoML containers
it is common practice to use pre-existing models for specific with the appropriate configuration, architecture, weights, hy-
tasks via model zoos [1] or online repositories [2]. Developers perparameters, dataset, and links to the proper data handling,
then adapt the models to fit their needs via techniques such as training, and verification chaincodes. (5) Validators will attest
transfer learning [3]. As ML becomes ubiquitous in every-day the performance of a model given the published results and
life and we continue to entrust it with more complex and critical evaluation data sets for each model.
tasks [4]–[7], it is imperative models can be trusted as they may Figure 1 shows a high-level overview of our proposed
not all go through the same peer review process. This need architecture. Before launching a decentralized AutoML process,
is exacerbated by AutoML, which aims to reduce the onerous the Customer and Architect have to agree on the requirements
development costs of automating the entire ML pipeline [8]. To for desired ML model. This means inputs, outputs, target
tackle this challenge, we leverage two emerging technologies: hardware, performance requirements, accuracy requirements,
decentralization and containers. Decentralized applications etc. The Customer and the Data Broker must also agree on
(dApps) were first introduced by [9], and further enhanced by the data needed to perform the AutoML task. The Architect
[10], [11]. In particular, Hyperledger Fabric [11] introduced the and Customer would need to agree on how the data should be
concept of chaincode (smart-contracts) running within their transformed, annotated, augmented, etc. Once all requirements
own containers, which enables environment reproducibility. and dataset are gathered, the Architect and the Data Broker
II. T RUSTLESS AUTO ML write their chaincodes, deploy them, and provide an initial
Dockerfile for the environment needed to perform the AutoML
task. Data is stored in IP F S [12] or a distributed datastore.