0% found this document useful (0 votes)
22 views

PS03 3

The document proposes a framework that combines AutoML techniques with blockchain to fully decentralize the design and training process of machine learning models. It introduces the concept of Trustless AutoML and describes the key components, including customers, architects, data brokers, peers, and validators. The framework uses chaincodes to define the AutoML tasks and runtimes within containers to run the tasks in parallel on different nodes.

Uploaded by

ANDREW KIMANI
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

PS03 3

The document proposes a framework that combines AutoML techniques with blockchain to fully decentralize the design and training process of machine learning models. It introduces the concept of Trustless AutoML and describes the key components, including customers, architects, data brokers, peers, and validators. The framework uses chaincodes to define the AutoML tasks and runtimes within containers to run the tasks in parallel on different nodes.

Uploaded by

ANDREW KIMANI
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Trustless AutoML for the Age of Internet of Things

Luis Angel D. Bathen Divyesh Jadav


IBM Research IBM Research
[email protected] [email protected]

Abstract—This paper introduces the concept of Trustless Au- and constraints. (2) The Data Broker is responsible for data
toML, and proposes a framework that combines AutoML tech- sanitization, annotated properly, and can sign-off the quality
niques with blockchain to fully decentralize the design and of the data. (3) The Architect is responsible for the definition
training process of machine learning models. The goal is to
introduce full transparency and trust in the model design pipeline of the initial Dockerfile that will be used to perform AutoML
to establish a root-of-trust. functionality such as architecture mutations/changes, training,
Index Terms—decentralized, blockchain, artificial intelligence, evaluation, testing tasks, etc. The architect is also responsible
machine learning, AutoML for creating the training loop and validation chaincodes. These
chaincodes are equivalent to what a f it/evaluate API would
I. I NTRODUCTION do within a runtime (e.g., PyTorch). (4) Each peer will run
Machine Learning (ML) models have traditionally been black our Trustless AutoML Runtime parallel to a blockchain peer
boxes, designed, trained, and vetted by subject-matter experts; instance. During each epoch, the AutoML engine will rely
this limits adoption as the learning curve is quite steep. Thus, on its AutoML chaincode to start the AutoML containers
it is common practice to use pre-existing models for specific with the appropriate configuration, architecture, weights, hy-
tasks via model zoos [1] or online repositories [2]. Developers perparameters, dataset, and links to the proper data handling,
then adapt the models to fit their needs via techniques such as training, and verification chaincodes. (5) Validators will attest
transfer learning [3]. As ML becomes ubiquitous in every-day the performance of a model given the published results and
life and we continue to entrust it with more complex and critical evaluation data sets for each model.
tasks [4]–[7], it is imperative models can be trusted as they may Figure 1 shows a high-level overview of our proposed
not all go through the same peer review process. This need architecture. Before launching a decentralized AutoML process,
is exacerbated by AutoML, which aims to reduce the onerous the Customer and Architect have to agree on the requirements
development costs of automating the entire ML pipeline [8]. To for desired ML model. This means inputs, outputs, target
tackle this challenge, we leverage two emerging technologies: hardware, performance requirements, accuracy requirements,
decentralization and containers. Decentralized applications etc. The Customer and the Data Broker must also agree on
(dApps) were first introduced by [9], and further enhanced by the data needed to perform the AutoML task. The Architect
[10], [11]. In particular, Hyperledger Fabric [11] introduced the and Customer would need to agree on how the data should be
concept of chaincode (smart-contracts) running within their transformed, annotated, augmented, etc. Once all requirements
own containers, which enables environment reproducibility. and dataset are gathered, the Architect and the Data Broker
II. T RUSTLESS AUTO ML write their chaincodes, deploy them, and provide an initial
Dockerfile for the environment needed to perform the AutoML
task. Data is stored in IP F S [12] or a distributed datastore.

A. Trustless AutoML Flow

Figure 1. Trustless AutoML Architecture


We use decentralization to introduce trust in AutoML
pipelines without having to trust the nodes performing the var-
ious design, training, and testing steps. We model our Trustless
AutoML framework as a combination of chaincodes that per- Figure 2. Trustless AutoML Flow
form different tasks such as f it, evaluate, data annotation, We now describe our proposed end-to-end flow (Figure 2):
etc. Sample participant roles are: (1) The Customer is the entity (1) Each task starts by building and deploying a container
that comes into the system with the request for a ML model according to the agreed Dockerfile specifications and configu-
that will perform some specific task given an initial dataset ration parameters (e.g., hyperparameters, learning rate, weight
978-1-6654-9538-7/22/$31.00 ©2022 IEEE decay, etc.). Within each container, the process of AutoML for
X epochs will start, this includes, model training, shrinkage/- each AutoML round. The same process is followed for weights,
expansion ( [13]), and validation using the desired dataset. (2) biases, and other hyperparameters. To adapt the checkpointing
Checkpoints are created, hashed, and all metrics are sealed to what would serve the AutoML best at a given point-in-time,
to prevent peers from copying each other’s results. Before a we’ve decided to enable a policy-base checkpoint engine, which
new training/search round starts, the leader (random peer) will allows the chaincodes to tell the AutoML runtime how to best
choose a nonce. Once the round is complete, each peer will checkpoint.
publish its results, and the nonce is used to verify the metrics
for each peer. The model architecture and the weights are
stored in decentralized storage. Once consensus is done, data
is unsealed, and we can verify all data published by peers. This
helps us keep peers honest as tampering is easily detected. (3)
Each training/search step requires model selection and metric
unsealing in order to continue with the exploration process.
(4) When peers detect a new model being published through
gossip, they will evaluate it, and may choose to kill the current
model they are training if the current performance is below the
newly minted model.
Figure 4. Comparison between Zip+Hash and Merkle validation schemes
B. Parallel Runtimes
Figure 4 shows a quick comparison between doing full
validation on the entire dataset (Zip+Hash) and our variable
Merkle-based approach Merkle on two ARMv8 and Intel Core
i7 systems. Though both approaches are comparable, the vari-
able approach gives us more flexibility when deciding how
much data we want to verify during each of the AutoML steps.

Figure 3. Trustless AutoML Runtime


As shown in Figure 3, the blockchain peer runtime and
the AutoML runtimes co-exist within the same node. The
AutoML pipeline is designed to operate as a slave service that
subscribes to a task queue. Each chaincode (CC1 − CCn ) is a
separate agreed business logic running within its own container,
deployed on its own channel (multiple chaincodes can co-exist
within the same channel as long as they belong to the same job).
Similarly, each Task (T ask1 −T askm ) represents one container
running our software stack and is responsible for performing
AutoML tasks for a given job. Jobs in this context refer to
the design process for an object detection model, an image Figure 5. loss, top 1% accuracy, top 5% accuracy for three kernel sizes
classifier, etc. For example, both CC1 and CC2 belong to the (Top=3, Bottom=7) running on two separate NVIDIA Tesla V100 nodes
same job, and they both interact with T ask1 . In a sense, a task
serves as the master to its respective chaincodes. Our prototype Next, we look at the the loss, top 1% accuracy, top 5%
uses a combination of Hyperledger Fabric V2.0, Docker for accuracy for the kernel selection stage in our AutoML process
runtimes, PyTorch for model design, and IPFS for storage. (Figure 5). Since the same nonce is used for each epoch
across various machines, the graphs are very similar. This is
C. Adaptive Checkpointing consistent for all other stages in the pipeline (omitted due to
In order to decentralize the training process, we need to space constraints), thus showing that we can achieve the same
efficiently manage data verification prior to each training step loss and accuracy despite of where the task runs.
as datasets may be several Gigabytes in size. Thus, on every
III. C ONCLUSION AND F UTURE W ORK
epoch, the round leader (one of the blockchain peers) chooses
a nonce, which is used as a seed to generate a set of indices. We introduced the concept of Trustless AutoML, a decen-
These indices are then used to generate a Merkle tree hash by tralized AutoML process for a transparent root-of-trust model
fetching the respective data items, sorting them in increasing design. Promising results show no impact to accuracy and
order, and hashing each of the pieces of data, we then generate a model performance with tolerable delays in the design process.
Merkle tree and save the nodes needed to verify the Merkle root Future work includes a permissionless blockchain protocol, and
hash. This allows us to provide a proof for data validation for multi-GPU NAS/training support.
R EFERENCES
[1] H. Y. et al., “TensorFlow Model Garden,” 2020. [Online]. Available:
https://github.com/tensorflow/models
[2] J. Redmon, “Darknet: Open source neural networks in c,” 2013–2016.
[Online]. Available: http://pjreddie.com/darknet/
[3] F. e. a. Zhuang, “A comprehensive survey on transfer learning,” Proceed-
ings of the IEEE, 2021.
[4] S. M. et al., “Practical solutions for machine learning safety in au-
tonomous vehicles,” CoRR, vol. abs/1912.09630, 2019.
[5] Foley and L. LLP, “Connected cars and autonomous vehicles
survey.” [Online]. Available: https://www.foley.com/files/uploads/
2017-Connected-Cars-Survey-Report.pdf
[6] B. Insight, “The future of autonomous cars.” [Online]. Available: http:
//www.berginsight.com/ReportPDF/ProductSheet/bi-autonomous-ps.pdf
[7] M. B. et al., “End to end learning for self-driving cars,” CoRR, vol.
abs/1604.07316, 2016.
[8] X. H. et al., “Automl: A survey of the state-of-the-art,” 2021.
[9] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” 2008.
[Online]. Available: https://bitcoin.org/bitcoin.pdf
[10] E. Foundation, “Ethereum,” 2017. [Online]. Available: https://www.
ethereum.org/
[11] H. Community, “Hyperledger fabric,” 2017. [Online]. Available:
https://hyperledger-fabric.readthedocs.io/en/release/
[12] J. Benet, “Inter planetary file system,” 2017. [Online]. Available:
https://github.com/ipfs/ipfs
[13] H. C. et al., “Once-for-all: Train one network and specialize it for efficient
deployment,” 2020.

You might also like