On-chainbehaviorpredictionMachineLearningmodelfor blockchain-basedcrowdsourcing
On-chainbehaviorpredictionMachineLearningmodelfor blockchain-basedcrowdsourcing
article info a b s t r a c t
Article history: In this paper, we address the problem of behavior prediction for task allocation in blockchain-based
Received 25 January 2022 crowdsourcing framework. Centralized crowdsourcing frameworks complement workers’ reputations
Received in revised form 12 May 2022 with predicted behavior, through Machine Learning (ML) models, to improve the task allocation
Accepted 28 May 2022
performance and maintain worker engagement. Existing blockchain-based crowdsourcing frameworks
Available online 6 June 2022
allocate tasks to workers using reputation solely, which neglects the impact of a task’s context on the
Keywords: worker’s behavior. Our contribution is an on-chain behavior prediction ML model for task allocation on
Machine Learning top of a proposed blockchain-based crowdsourcing framework. The ML model, hosted on blockchain,
Blockchain reflects a worker’s unique behavior for a task given its context. The proposed ML model is: (1) trained
Behavior off-chain since it has lower monetary cost compared to on-chain training, and (2) deployed on-chain
Crowdsourcing as a smart contract to enable transparent predictions. The task allocation mechanism in the proposed
Smart contract blockchain-based crowdsourcing framework considers workers’ predicted behavior and a Quality of
Bagged Trees
Information (QoI) metric that includes distance to the task, completion time, and workers’ reputation.
The evaluation conducted confirms that the proposed task allocation mechanism, implemented using
Solidity, outperforms the benchmark in terms of percentage of allocation, workers’ QoI, and reputation
change.
© 2022 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.future.2022.05.025
0167-739X/© 2022 Elsevier B.V. All rights reserved.
M. Kadadha, H. Otrok, R. Mizouni et al. Future Generation Computer Systems 136 (2022) 170–181
Learning (ML) models [25–27]. Such an integration paves the way 2. Related work
to applications in different areas. However, none of the existing
works leverages the blockchain for transparent predictions on- 2.1. Blockchain-based crowdsourcing
chain due to the cost computationally complex mechanisms will
imply. Subsequently, the blockchain-based crowdsourcing frame- Blockchain has been introduced into crowdsourcing frame-
works do not utilize such an integration to complement existing works such as [7–14] for trusted, autonomous, and transparent
task allocation mechanisms. execution. In [7–9], blockchain is used for incentive exchange and
In this paper, behavior prediction ML model on blockchain is interaction management between requesters and workers as part
proposed along with a blockchain-based framework for crowd- of the crowdsourcing framework. The work in [7] enables secure
sourcing applications such as ride-sharing and last mile delivery. exchange of incentives through blockchain according to workers’
The behavior prediction ML model on blockchain is proposed contributions. More generically, the works in [8,9] utilize smart
contracts to manage a task’s activity. Smart contracts are respon-
to complement the use of reputation in crowdsourcing. The ML
sible for exchanging solutions and incentives between workers
model is trained by each worker to reflect his/her unique be-
and requesters in a trusted manner. Nevertheless, none of these
havior towards tasks based on the day of the week and the
works incorporate the allocation of tasks within the blockchain.
weather condition. Workers train their models off-chain using
Alternatively, the works in [10–15] incorporate greedy, auc-
their data and resources, according to the Bagged Trees (BT) [28]
tion, and matching mechanisms as part of their frameworks for
algorithm. Off-chain training mitigates the monetary cost for on-
task allocation. In [10], a greedy task allocation mechanism is pro-
chain training. The trained models are then deployed on-chain posed as part of the blockchain-based framework along with the
as smart contracts — User Behavior ML Contracts (UBMLCs). On- solution evaluation and incentive computation mechanisms. The
chain deployment makes the prediction traceable, reliable, and proposed allocation mechanism is reserve-and-allocate, where
cost-efficient. Further, a blockchain-based crowdsourcing frame- workers reserve a slot in a task and get allocated based on the
work is proposed with a task allocation mechanism that interacts QoI of interested workers.
with the UBMLCs. The proposed framework consists of three In [11,12], auction mechanisms are utilized for allocating tasks
additional smart contracts: User Manager Contract (UMC)- which to workers based on shared bids. While the work in [11] is
manages user registration, Task Manager Contract (TMC)- which proposed for full on-chain deployment, the work in [12] assumes
collects tasks and implements the proposed task allocation mech- the existence of an Internet Service Provider (ISP). The ISP acts as
anism, and Context Manager Contract (CMC)- which holds the an intermediary between workers and the auction smart contract.
current context used for behavior prediction and can be designed In [13–15], matching mechanisms are incorporated for the
based on metrics of interest to the framework. allocation of tasks in the framework. In [13], a consensus node
The task allocation mechanism incorporated in TMC allocates is utilized for the collection of tasks, the computation of the
workers based on the predicted behaviors from their UBMLCs matching degree of workers to tasks, and the allocation of tasks
to account for the context, and a Quality-of-Information (QoI) to workers. Afterward, the requester collects and evaluates the
metric. The QoI reflects the capabilities of the worker and is com- submissions of his task. In [14], the proposed framework uses
puted based on worker reputation, distance from the task, and a smart contract to collect tasks and accept matching requests
time for task completion. The allocation mechanism increases the from available workers. In [15], Gale–Shapley matching mecha-
probability of workers being assigned tasks they would complete, nism is proposed for stable allocation of tasks to workers in a
which increases their reputations and motivates their participa- blockchain-based crowdsourcing framework.
tion. In summary, our contributions are as follows: Despite the adequacy of these frameworks, they rely on the
declaration of workers for their interest where the framework
• A behavior prediction ML model, which is trained Off-chain does not have sufficient information. Therefore, none of these
to reduce the monetary cost overhead but deployed On- works can allocate workers based on their predicted behavior
chain for transparent worker behavior prediction. for available tasks as in centralized frameworks. Hence, they
• A task allocation mechanism, incorporating the predicted cannot maximize the performance of the framework, nor can they
behavior and a QoI metric for a blockchain-based crowd- motivate the engagement of workers.
sourcing framework. It is worth noting that our previous work in [10] includes
• A blockchain-based crowdsourcing framework, which over- an extensive comparison between centralized frameworks and
comes the limitations of centralized frameworks and mo- blockchain-based ones. The evaluation conducted showed that
tivates worker’s engagement by its task allocation mecha- the migration to blockchain-based frameworks provides the fra-
nism. mework with transparency while maintaining a comparable per-
formance to centralized frameworks.
The proposed work is evaluated using Matlab1 and Solidity.2
The evaluation uses a real-life dataset, obtained from RideAustin,3 2.2. AI/ML for behavior prediction
that holds the behavior of workers for their previously allocated
tasks. When incorporating the behavior prediction in task allo- User behavior prediction using ML models have been adopted
cation, the percentage of tasks allocated to workers predicted to by multiple centralized platforms such as [1–4]. In [1], a be-
complete them is increased as well as the QoI of the allocated havior prediction ML model for an IoT data provider is trained
workers. In addition, the workers’ reputations observe a positive using Support Vector Machine (SVM). It is used to predict the
increase when incorporating the behavior of workers. Finally, trustworthiness of the data submitted by predicting its provider’s
the cost and scalability analysis confirm the feasibility of the behavior. In [2], behavior prediction ML models are used for pilot
proposed blockchain-based framework and the cost-efficiency of identification after being trained using Random Forest classifier
the adopted task allocation mechanism. and deployed on Unmanned Aerial Vehicles (UAVs). This enables
the detection of UAV hijack based on its flight pattern.
In [3], behavior prediction is proposed for centralized crowd-
1 https://www.mathworks.com/products/matlab.html. sourcing frameworks. The platform predicts workers’ behaviors
2 https://solidity.readthedocs.io/en/latest/. and recommends tasks to ones with high prediction scores. How-
3 https://public.opendatasoft.com/explore/dataset/rideaustin/table/. ever, the task recommendation is based on a worker-oriented
171
M. Kadadha, H. Otrok, R. Mizouni et al. Future Generation Computer Systems 136 (2022) 170–181
The models from different users are used in the allocation to Assigned load Number of tasks assigned at a day
Worker
Completed load Number of completed tasks at a day
tasks. In [4], Random Forest classifiers are trained and used for
Worker rating Cumulative rating of the worker
worker behavior prediction, which is further incorporated in the Car rating Cumulative rating of the worker’s car
task allocation mechanism.
Publisher rating Cumulative task publisher’s rating
While these works improve the allocation of tasks in crowd- Starting time Starting time of the task
sourcing, they rely on a centralized platform for training and Surge factor Adjustment in pricing
deploying the ML models. For such a platform, the computa- Weekend Kind of day; weekend or weekday
tional efficiency of the selected ML models is not considered. Holiday Kind of day; holiday or normal day
Max temperature Maximum temperature in ◦ F on the task day
In addition, the concealed execution for the prediction makes it
Task
Min temperature Minimum temperature in ◦ F on the task day
unverifiable by workers. Precipitation Condition of rain at the task time
Wind speed Speed of the wind at the task time
2.3. ML and blockchain Wind gust Sudden, short burst of wind speed
Fog Presence of fog at the task time
Heavy fog Presence of heavy fog at the task time
Blockchain has been exploited for data management in AI [22, Thunder Presence of thunder at the task time
23] and ML model exchange [25–27]. In [22], blockchain is used
to migrate the control of a user’s data required for training from
the platform to the user. It enables users to exchange their data
for monetary incentives in a decentralized, secure, and trusted 3.2. Off-chain ML model training
manner. In [23], data is encrypted and hosted on the blockchain
with access to it being managed by the data owner. ML models Training the behavior prediction ML model is performed off-
are trained using blockchain-based encrypted data to maintain chain on a worker’s end. A worker populates their dataset con-
data privacy. sisting of feature space for their allocated tasks. Each entry is
In [27], the idea of ML models on blockchain was introduced labeled according to the outcome of the task, whether cancelled
with a simplistic ML classifier hosted within a smart contract. The or fulfilled. The off-chain training consists of three stages: data
classifier is updated based on data exchanged with the smart con- preparation, feature extraction, and model generation.
tract providing transparent training and prediction for workers. Data Preparation is an essential pre-processing stage since
While promising, the work presents the integration of ML and the quality of the data affects the performance of the trained
blockchain abstractly without discussing the computation cost of model. First, the categories in the dataset are balanced using
such an approach. the Synthetic Minority Over-sampling Technique (SMOTE) [30].
Differently, in [25,26], the exchange of ML models and their SMOTE is a primary technique selected to over-sample the mi-
parameters through blockchain is proposed. In [25], Ethereum nority class in the dataset due to its ability to avoid over-fitting
blockchain is utilized for users to train ML models and exchange and to maintain all the original information [31]. SMOTE balances
them with the requesting entity. Once evaluated, their payments the dataset by taking samples of the feature space and its k
can be forwarded in a secure end-to-end manner. In [26], the nearest neighbors, with k being dependent on the amount of
blockchain is used to incentivize users to contribute to the train- over-sampling needed. Synthetic samples are generated along the
ing of a deep learning model by submitting their computed gra- line segments joining them and their neighbors. Using SMOTE,
dient values. the additional synthetic samples are added without duplicating
It can be seen that none of the existing works propose a or replacing existing ones.
blockchain-based crowdsourcing framework leverage ML to im- Second, the dataset is cleaned by replacing missing values
prove task allocation. using the iterative Probabilistic Principal Component Analysis
(PCA) method [32]. PCA performs linear correlation among the
3. A blockchain-hosted ML model columns to estimate a missing value through multiple iterations.
Its statistical methodology eliminates the loss of data due to
The first part in the proposed work is a behavior prediction ML removing a sample or replacing it with a constant value. Last,
model hosted on Ethereum blockchain for workers in a crowd- duplicate samples are removed from the prepared dataset to
sourcing framework. The objective of the model is to predict the prevent biasing the trained model.
behavior of a worker for a task given its context in a transparent Feature Extraction is required to differentiate significant fea-
and traceable method. The training is performed off-chain using tures in the dataset from redundant ones for a trained model.
a worker’s resources to reduce on-chain computation. Trained While redundant features do not affect the performance of a
ML models are deployed on Ethereum as smart contracts. The trained model, they can result in the model over-fitting and
smart contracts hosting the ML models enable users to trace the add computational complexity [33]. The feature selection stage
prediction generation as it is performed transparently, which is starts by splitting the dataset into training (80%) and testing
not the case with centralized frameworks. (20%) datasets. The training dataset is used to perform an initial
training using the selected ML model. The Permutation Feature
3.1. Worker dataset features Importance (PFI) is used to quantify the importance of the dataset
features as scores [34]. To compute the importance score of a
The RideAustin dataset is used as the combined dataset of feature, it is removed from the dataset before training the model.
workers, which are drivers who perform ride-sharing tasks. The The trained model’s performance is measured, and compared to
dataset holds the information relevant to workers and the tasks when the feature is included in the training. A high importance
they perform. In addition, it indicates, whether the worker com- score implies an important feature, whereas a score of 0 im-
pleted or cancelled the task. Table 1 presents a summary of the plies a redundant one, which can be removed when retraining
features in the dataset, classified as worker-related and task- the model. As workers behave differently, their feature selection
related features. process can lead to models with different features.
172
M. Kadadha, H. Otrok, R. Mizouni et al. Future Generation Computer Systems 136 (2022) 170–181
Table 2
User Behavior ML Model Contract (UBMLC).
Variables
Owner (address)
False Positive (FP) (uint) True Positive (TP) (uint)
Function Parameters Return
decisionTreeX () Features tree prediction
predict() Features bagging prediction
Table 4
User Manager Contract (UMC).
Data Structure
User
Car Rating (uint) Reputation (uint) Latitude (uint)
Cities (bytes1[]) Longitude (uint)
Allocated (uint) UBMLC (address)
Completion Time (uint) Status (uint)
Accepted/ Cancelled (uint) Type (uint)
Variables
User List (address⇒User)
City Workers (bytes1⇒ address[])
Function Parameters Return
addUser() User Information –
updateWorkerStatus() Status –
updateCity() City, Action –
updateLocation() Location –
updateReputation() Task Status –
updateUBMLC() Address –
getWorkers() City User[]
based on their role calculated according to Eqs. (3) and (4). Status
indicates the availability of a worker to perform tasks. Cities holds
the cities a user is interested in, which is presented as an IATA
3-letter city code for consistency [39]. A user can add multiple
Fig. 2. Proposed blockchain-based crowdsourcing framework with smart con-
cities to their information. Latitude and Longitude reflect the GPS
tracts hosted on Ethereum showing the environment and interactions between
framework and users. coordinates of an available worker. UBMLC is the address of the
ML smart contract. Accepted/ Cancelled implies the accepted tasks
Table 3 fulfilled by a worker or the cancelled tasks by the requester based
Notations. on the user’s role. Total is the total tasks assigned to a worker or
Notation Description issued by a requester. Completion Time is the average completion
α and β Weighting factor time of a worker for an allocated task estimated based on prior
Cancelledr Number of cancelled tasks by requester r tasks. Allocated indicates whether a worker is currently allocated
Totalr Total number of tasks created by requester r. to a task or not. Car Rating holds the rating of the car a worker
Repr Reputation of requester r uses and it was added as a requirement by the used dataset.
Acceptedw Number of accepted submissions to tasks by worker w UMC keeps users’ information in the User List mapping, which
Totalw Total number of tasks assigned to worker w maps a user’s address to his/her User object. Workers are grouped
rw Radius of interest for worker w into cities where the list of workers in a city is kept in the City
Repw Reputation of worker w Workers mapping. It maps the city code to an array of Ethereum
QoIwt Quality of Information of worker w to task t
addresses for workers who are currently in the city. Cities are
Dw t The Manhattan distance between worker w and task t
used for the mapping, as opposed to latitude and longitude co-
CTwt The completion time of worker w for task t
ordinates since they change less frequently, making them a more
T (c) Set of tasks in city c
cost-efficient choice.
W (c) Set of workers in city c
The addUser() function allows a user to register by providing
FW (t) Set of filtered workers pairs for task t (worker ID, QoI)
the necessary information for a User object to be created and
Allocatedw Boolean for worker w allocation status
mapped in User List and City Workers. The updateWorkerStatus(),
updateCities(), updateLocation(), and updateUBMLC() functions al-
low workers to update their information and registered ML smart
contract. updateReputation() is an internal function called to up-
Workers are assumed to have a maximum distance range for date the reputation of a requester/ worker according to Eqs. (3)
tasks they are interested in, rw . For tasks within rw , a Quality and (4). The getWorkers() function is used to acquire the list of
of Information (QoI) metric is computed to quantify a worker’s workers in a specific city.
contribution to a task. In this work, the QoI is calculated on-chain The Task Manager Contract (TMC), shown in Table 5, collects
considering the reputation of the worker (for reliability), the tasks, and holds the task allocation function.
distance (for accuracy), and the completion time (for timeliness). The Task data structure is designed to hold the task’s infor-
It is computed based on Eq. (5). mation. Requester holds the Ethereum address of the requester.
Repw Reputation holds the requester’s reputation. Duration indicates the
QoIwt = (5) duration of the task based on its deadline. Latitude and Longitude
Dwt × CTwt
are the GPS coordinates of the task. Min. Reputation is the min-
4.3. Smart contract implementation imum acceptable reputation of the workers that are eligible to
participate in the task. Deposit defines the budget dedicated by
The User Manager Contract (UMC), shown in Table 4, regis- the requester for accomplishing the task. Status reflects whether
ters the information of requesters and workers to the proposed a task is pending or completed. TMC maintains the information
framework. The User data structure is designed to hold user’s of tasks in the City Tasks mapping, which maps a city code to an
information. Type signifies whether a user is a requester or a array of Task objects, similar to the User List in UMC. Active Cities
worker. Reputation reflects the reliability of the requester/worker holds the list of cities with available tasks.
174
M. Kadadha, H. Otrok, R. Mizouni et al. Future Generation Computer Systems 136 (2022) 170–181
Fig. 3. Interactions between workers, requesters, and smart contracts part of the proposed framework.
Task Allocation. TMC executes the allocateTasks() function Worker Dataset Attributes
for a given city to allocate tasks to workers. First, the context Considered Workers 87 workers
Minimum rides per worker 100 rides
is acquired from CMC using the getContext() function. Second,
Cancellation rate per worker At least 5% cancelled
the list of workers in a city is acquired from UMC using the Worker/ Requester Reputation [0–100]
getWorkers() function. Third, the behavior of eligible workers is α and β 0.5
predicted for the task allocation mechanism using the predict() Task Generation Attributes
function in the worker’s UBMLC with their QoI calculated at Available Tasks [30, . . . , 100]
TMC for the allocation. Workers are notified about their allocated Longitude [−98.05, . . . , −97.36]
tasks. Once a task is completed, the payment is calculated based Latitude [29.83,. . . , 30.71]
on the adopted payment mechanism. A worker’s payment can be Required Workers 1
Number of runs 5 per data point
computed based on the submission quality as in [10] or based on
ML Model Training
workers’ declared bids as in [11]. The payment is forwarded from
the deposited budget by a task requester. Any remaining deposit Software Matlab 2019
ML Model Bagged Trees
is returned to the requester. Dataset Split 80% training, 20% testing
Blockchain
5. Performance evaluation
Software Ganache Blockchain Client
Solidity 0.7.4
The evaluation conducted for the proposed ML models and Gas Price 42 Gwei/ gas
framework is divided into three main components. First, the per- Number of iterations 5
Ether Price 405.02 USD/ Ether
formance of the off-chain trained behavior prediction ML model is
evaluated for a varying number of trees. Second, the performance
of the proposed task allocation mechanism is compared to an
existing benchmark that does not rely on prediction to allocate 87 workers were identified with this criteria. Workers and Re-
tasks. Third, the cost analysis and scalability of the proposed questers are assigned initial reputation values from a uniform
framework and incorporated mechanisms are discussed to verify distribution function with α and β set to 0.5. This value is se-
its cost efficiency and scalability. lected for the historical reputation and the sampled reputation to
have equal contribution to the current reputation. Tasks are cre-
5.1. Evaluation setup and benchmark ated within the area of the selected workers with the additional
metrics mentioned in Table 7.
Table 7 summarizes the used evaluation setup. A real-life Matlab is used to train and evaluate the ML models of work-
workers’ dataset obtained from RideAustin is used. The dataset ers. The number of trees in the bagging of a worker is varied
incorporates tasks (rides) allocated to workers (drivers), the task to quantify its impact on the model’s performance. Identifying
context, and outcome; completed or cancelled. The dataset as- the number is important as on-chain deployment and prediction
sumes all workers are in a single city. The workers with at entail a monetary cost based on the computational complexity of
least 100 rides recorded and a cancellation rate of at least 5% the model, which needs to be reasonable. Hence, it is important to
are selected for the evaluation for diverse entries per worker; ensure that the deployed model is computationally efficient. The
176
M. Kadadha, H. Otrok, R. Mizouni et al. Future Generation Computer Systems 136 (2022) 170–181
dataset of each worker is split 80% for training and 20% for testing. Table 8
The proposed mechanisms and framework are implemented and Feature selection impact on ML Model.
evaluated using Solidity and Web3.js library. Ganache is used to Before Feature Selection
create a local blockchain for framework deployment. Recall Precision F1-Score Accuracy
The proposed task allocation mechanism integrating workers’ µ 0.81 0.89 0.85 0.95
behavior prediction and QoI is novel where none of the existing σ 0.09 0.06 0.07 0.02
works adopt it. The closest work in literature is SenseChain [10] After Feature Selection
where QoI is used for on-chain task allocation which is used as Recall Precision F1-Score Accuracy
a benchmark. The allocation mechanism is slightly modified to µ 0.81 0.89 0.84 0.95
ensure the fairness in comparison. The modification is in the cal- σ 0.09 0.06 0.07 0.02
culation of the QoI, which is calculated on-chain for the allocation
mechanism without a reservation phase.
Algorithm 2 illustrates the steps for the benchmark task allo-
cation mechanism. This mechanism is to be executed within TMC 5.3. Task allocation mechanism performance
similar to the proposed task allocation mechanism.
The proposed task allocation mechanism is compared to the
Algorithm 2 Benchmark
benchmark in two scenarios: (1) when varying the number of
Phase 1: Select Best Task for Worker tasks, (2) when varying the percentage of unwilling workers.
An unwilling worker cancels an allocated task if their predicted
1: for w ∈ W (c) do
confidence by the ML model to perform is lower than a threshold.
2: for t ∈ T (c) do
When varying the number of tasks, the number of unwilling
3: Calculate Dwt as Manhattan distance
workers is fixed.
4: if Dwt < rw then
Fig. 5 presents the performance of the proposed mechanisms
5: Calculate QoIwt using Eq. (5)
against the benchmark considering different size of task sets.
6: Find t ′ ← argmaxt ∈T (c) QoIwt Unwilling workers are injected to understand the impact of their
7: FW (t ′ ) ← FW (t ′ ) ∪ {w, QoIwt ′ } existence on the performance of the proposed task allocation
mechanism.
Phase 2: Select Best Worker for Task
Fig. 5(a) demonstrates that the proposed mechanism maxi-
8: Allocatedw∈W (c) ← False mizes the allocation of tasks to committed workers compared to
9: for t ∈ T (c) AND FW (t) ̸ = φ do the benchmark. This increase is the result of considering the pre-
10: Find w ′ ← argmaxw∈FW (t) QoIwt dicted behavior of a worker during the allocation, which allows
11: Allocatedw′ ← TRUE for the prediction of a more accurate QoI.
Fig. 5(b) presents the average QoI of allocated workers per
In the worker-task selection phase, the Manhattan distance
task. The proposed mechanism results in higher QoI when varying
between each worker w and task t within the same city c is
the number of tasks compared to the benchmark. The proposed
calculated. Then, for each task within rw distance, the QoIwt is
mechanism is of higher QoI as it does not allocate tasks to work-
calculated. For a given worker, the task t with maximum QoI is
ers that will not complete them, which is what causes the drops
selected and his entry is appended to its selected task FW (t). The
in the average QoI of the benchmark.
task-worker allocation phase allocates tasks to the highest QoI
Fig. 5(c) shows the average change in workers’ reputations.
workers.
The proposed mechanism leads to a higher change in the repu-
5.2. Behavior prediction ML model performance tation of workers compared to the benchmark. This is due to the
fact that the proposed mechanism only allocates tasks to workers
Fig. 4 shows the average performance of the off-chain trained who are expected to complete their tasks, based on the ML model
ML models for workers in the framework. The results show that predictions. The change in reputation increases with more tasks
while the performance fluctuates when a small number of trees as workers have more chances to participate in sensing tasks and
is included in the bagging, a continuous improvement is observed hence improve their reputation.
with more trees included. Such a performance is due to the Fig. 6 presents the performance of the proposed mechanisms
reduced variance by the additional trees. The accuracy is the high- with a fix number of tasks and varying worker unwillingness.
est performance metric while the precision, F1-score, and recall Fig. 6(a) presents the percentage of tasks allocated to workers
respectively follow. Precision is of significance as it quantifies the willing to complete them. The figure shows that the proposed
number of correct positive predictions made by the model, which mechanism, which considers the predicted behavior, provides a
is relevant to task allocation in crowdsourcing. It can be inferred better allocation compared to the benchmark, by at least 5%,
that the performance of the ML models converges when six trees as it lowers the allocation of tasks to workers that will cancel
are part of the bagging. This is important to identify as workers them. This performance is a consequence of introducing work-
deploy their models on-chain, and the additional trees increase ers’ predicted behavior, which indicates their commitment to a
the computational complexity of the model, according to Eq. (2), given task. Hence, the allocated tasks to a worker align with
and in turn the deployment and prediction cost, as will be shown their capabilities and interests. The performance gap between
in Section 5.4. the two mechanisms increases when more unwilling workers are
Table 8 shows the average performance results of the trained part of the framework as the benchmark assumes workers will
ML model before and after feature selection, when six trees are perform any allocated task. In addition, the proposed mechanism
part of the bagging. It is evident that the average performance and demonstrates a stable performance with more unwilling workers
standard deviation are almost identical before and after feature in the framework compared to the benchmark as it accounts for
selection, which confirms that the features excluded from the their behavior and would not allocate a task to a worker that
training are insignificant to the model. Nevertheless, excluding might cancel it. While the proposed mechanism improves the
these features reduces the computation complexity of the trained percentage of allocated tasks, some tasks remain un-allocated due
model according to Eqs. (1), and (2). to the unavailability of workers that can be allocated to them.
177
M. Kadadha, H. Otrok, R. Mizouni et al. Future Generation Computer Systems 136 (2022) 170–181
Fig. 6. Proposed task allocation mechanism performance results under different percentage of unwilling workers.
178
M. Kadadha, H. Otrok, R. Mizouni et al. Future Generation Computer Systems 136 (2022) 170–181
Table 9
Blockchain implementation cost.
Contract Function Gas Ether USD
Deployment 1228566 0.006 $1.124
UMC
addUser() 352352 0.0018 $0.322
Deployment 106753 4.3E−04 $0.089
CMC setContext() 185149 9.3E−04 $0.169
getContext() – – –
Deployment 3025367 0.015 $2.768
UBMLC
predict() 31748 0.0002 $0.029
TMC- Deployment 1868464 0.009 $1.710
proposed addTask() 221609 0.001 $0.203
allocateTasks() 575082 0.003 $0.526
TMC- Deployment 1398656 0.007 $1.280
benchmark allocateTasks() 548073 0.003 $0.501
Workers are unavailable because of their locations and predicted Fig. 7. Scalability of task allocation Cost.
behavior.
Fig. 6(b) presents the average QoI of allocated workers. The
average QoI of workers allocated in the proposed task allocation For TMC, the costs4 of the proposed task allocation and bench-
mechanism is higher (by at least 6.5%) than the benchmark. mark mechanisms are shown in Table 9. It can be seen that while
The difference in QoI between the two allocation mechanisms is the performance improves significantly with the proposed task
attributed to the higher cancellation of tasks by workers in the
allocation mechanism, as discussed previously, the cost implica-
benchmark. The cancelled tasks, which are of higher number in
tion is not as significantly increased (less than 5%).
the benchmark, are left incomplete, hence the drop in the QoI.
Fig. 7 shows how the cost of the allocation mechanism scales
Similar to Fig. 6(a), the proposed mechanism is less affected by
with more tasks in a city.
the increase in the percentage of unwilling workers as it accounts
The figure demonstrates the cost when 10 workers are within
for their predicted behavior during allocation.
Fig. 6(c) shows the average change in workers’ reputations. a city. The number of tasks is incremented from 5 to 30, in
A positive change indicates an improvement in the reputation, increments of 5. As expected, the cost of the proposed allocation
while a negative ones implies a drop. The proposed mechanism mechanism is higher than the benchmark where the gap between
aids workers to keep and improve their reputations (by at least the two allocation mechanisms increases with more tasks. While
36%) when considering their expected behavior before the alloca- this might seem an issue, the maximum cost observed in the
tion. Hence, workers are motivated to engage in tasks allocated by figure is approximately $0.17 per task, as opposed to $0.12 per
the proposed mechanism without canceling them. On the other task. Compared to existing platforms with up to 20% service cost,
hand, the benchmark results in a drop in workers’ reputations the task allocation cost is viable for crowdsourcing tasks such as
as it neglects their prior behavior for tasks in a similar context rides, where requesters pay relatively higher charges. Therefore,
and only accounts for their availability status and current capa- it can be concluded that the proposed framework is feasible, and
bilities. Hence, these allocated workers might cancel allocated the adopted allocation mechanism is not of significant overhead
tasks, which drops their reputations. The performance of both on cost.
mechanisms drops with more unwilling workers in the frame-
work as more tasks are cancelled. Nevertheless, the proposed task 6. Conclusion
allocation mechanism manages to maintain a positive impact on
workers’ reputations.
In this work, a blockchain-based crowdsensing framework
5.4. Cost and scalability analysis for multiple requesters and multiple workers, called SenseChain,
which overcomes the challenges of the centralized framework
The cost of deploying the framework and executing its mech- and requires reasonable cost, is presented. SenseChain utilizes
anisms is reported in Table 9. Smart Contracts to replace the centralized platform by (1) reg-
The deployment costs of UMC, TMC, and CMC are one-time istering users and maintaining their information reliably, (2) col-
paid costs to set up the framework. Alternatively, the cost of lecting and publishing tasks, (3) selecting workers in an unbiased
UBMLC is paid by each worker for every ML model they deploy manner, and (4) transparently evaluating solutions and sharing
on-chain. With the cost of UBMLC deployment being relatively proportional payments with workers. The proposed framework
high, rational workers are encouraged to deploy models of high provides a generic decentralized platform for both requesters and
performance to reduce the redeployment cost and benefit from workers while allowing centralized-like functions in a trusted
the framework. manner. This motivates the engagement of both workers and
The smart contracts’ functions are of relatively small costs and requesters and centralizes their resources to a trusted framework.
charged based on usage. For UMC, the cost of the addUser() func- When comparing SenseChain to a centralized greedy selection,
tion is a one-time charged cost per user when he/she registers it is notable that the performance is similar in terms of the quality
to the framework. Updating the user’s information through the of selected workers, distance traveled, task completion duration,
other UMC function is of negligible cost to the user. For CMC, the and submitted solution quality. In addition, the low deployment
setContext() function cost is paid when the context changes and cost of the framework affirms the viability of blockchain-based
is endured by dedicated entities. On the other hand, getContext() crowdsensing frameworks.
is a calling function, hence is of no cost to users. For UBMLC, the
predict() function cost, which includes the cost of all the trees’
prediction for a worker’s ML model, is minimal. 4 This is the cost with 5 workers and 3 tasks in 1 city.
179
M. Kadadha, H. Otrok, R. Mizouni et al. Future Generation Computer Systems 136 (2022) 170–181
CRediT authorship contribution statement [16] M. Kadadha, H. Otrok, A blockchain-enabled relay selection for qos-OLSR
in urban VANET: A stackelberg game model, Ad Hoc Netw. 117 (2021)
Maha Kadadha: Conceptualization, Methodology, Software, 102502, http://dx.doi.org/10.1016/j.adhoc.2021.102502, URL https://www.
Writing. Hadi Otrok: Conceptualization, Supervision, Writing – sciencedirect.com/science/article/pii/S1570870521000615.
[17] F. Ayaz, Z. Sheng, D. Tian, G.Y. Liang, V. Leung, A voting blockchain based
review & editing. Rabeb Mizouni: Conceptualization, Supervision,
message dissemination in vehicular ad-hoc networks (VANETs), in: ICC
Writing – review & editing. Shakti Singh: Conceptualization,
2020 - 2020 IEEE International Conference on Communications, ICC, 2020,
Supervision, Writing – review & editing. Anis Ouali: Conceptu- pp. 1–6.
alization, Supervision, Writing – review & editing. [18] C. Dai, X. Xiao, Y. Ding, L. Xiao, Y. Tang, S. Zhou, Learning based security
for VANET with blockchain, in: 2018 IEEE International Conference on
Declaration of competing interest Communication Systems, ICCS, 2018, pp. 210–215.
[19] T. Jiang, H. Fang, H. Wang, Blockchain-based internet of vehicles: Dis-
The authors declare that they have no known competing finan- tributed network architecture and performance analysis, IEEE Internet
cial interests or personal relationships that could have appeared Things J. 6 (3) (2019) 4640–4649.
to influence the work reported in this paper. [20] M. Taghavi, J. Bentahar, H. Otrok, K. Bakhtiyari, A blockchain-based model
for cloud service quality monitoring, IEEE Trans. Serv. Comput. 13 (2)
Acknowledgments (2020) 276–288, http://dx.doi.org/10.1109/TSC.2019.2948010.
[21] A. Banafa, Blockchain and AI: A perfect match? | OpenMind, 2021,
OpenMind, URL https://www.bbvaopenmind.com/en/technology/artificial-
This work was supported by the Khalifa University of Sci-
intelligence/blockchain-and-ai-a-perfect-match/.
ence and Technology-Competitive Internal Research Award CIRA- [22] A. Marathe, K. Narayanan, A. Gupta, M. P.R., DInEMMo: Decentralized
2020-028, United Arab Emirates. incentivization for enterprise marketplace models, in: 2018 IEEE 25th
International Conference on High Performance Computing Workshops,
References HiPCW, 2018, pp. 95–100.
[23] M. Shen, X. Tang, L. Zhu, X. Du, M. Guizani, Privacy-preserving support
[1] M. Anisetti, C.A. Ardagna, E. Damiani, A. Sala, A trust assurance technique vector machine training over blockchain-based encrypted IoT data in smart
for internet of things based on human behavior compliance, Concurr. cities, IEEE Internet Things J. 6 (5) (2019) 7702–7712.
Comput.: Pract. Exper. n/a (n/a) e5355, http://dx.doi.org/10.1002/cpe.5355, [24] A. Hammoud, H. Sami, A. Mourad, H. Otrok, R. Mizouni, J. Bentahar,
e5355 cpe.5355. Ai, blockchain, and vehicular edge computing for smart and secureIoV:
[2] A. Shoufan, H.M. Al-Angari, M.F.A. Sheikh, E. Damiani, Drone pilot identifi- Challenges and directions, IEEE Internet Things Mag. 3 (2) (2020) 68–73,
cation by classifying radio-control signals, IEEE Trans. Inf. Forensics Secur. http://dx.doi.org/10.1109/IOTM.0001.1900109.
13 (10) (2018) 2439–2447. [25] A. Kurtulmus, K. Daniel, Trustless machine learning contracts; evaluating
[3] S. Nath, M. Goraczko, J. Liu, A. Mirhoseini, US20150317582A1 - Optimizing and exchanging machine learning models on the ethereum blockchain,
task recommendations in context-aware mobile crowdsourcing - Google 2018.
Patents, Patents.Google.Com, 2020, URL https://patents.google.com/patent/ [26] J. Weng, J. Weng, J. Zhang, M. Li, Y. Zhang, W. Luo, DeepChain: Auditable
US20150317582A1/en. and privacy-preserving deep learning with blockchain-based incentive,
[4] M. Abououf, S. Singh, H. Otrok, R. Mizouni, E. Damiani, Machine learning IEEE Trans. Dependable Secure Comput. (2019) 1.
in mobile crowd sourcing: A behavior-based recruitment model, 2021. [27] J.D. Harris, B. Waggoner, Decentralized and collaborative AI on blockchain,
[5] S. Nakamoto, Bitcoin: A peer-to-peer electronic cash system, 2009, in: 2019 IEEE International Conference on Blockchain, Blockchain, 2019,
CryptograpHy Mailing List At https://Metzdowd.Com. pp. 368–375.
[6] G. Wood, Ethereum: a Secure Decentralised Generalised Transaction
[28] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32.
Ledger, Ethereum Project Yellow Paper, Vol. 151, 2014.
[29] M. Abououf, H. Otrok, R. Mizouni, S. Singh, E. Damiani, How artificial
[7] J. Wang, M. Li, Y. He, H. Li, K. Xiao, C. Wang, A blockchain based privacy-
intelligence and mobile crowd sourcing are inextricably intertwined, IEEE
preserving incentive mechanism in crowdsensing applications, IEEE Access
Netw. (2020) 1–7, http://dx.doi.org/10.1109/MNET.011.2000516.
6 (2018) 17545–17556, http://dx.doi.org/10.1109/ACCESS.2018.2805837.
[30] N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, SMOTE: Synthetic
[8] Y. Lu, Q. Tang, G. Wang, Zebralancer: Private and anonymous crowd-
minority over-sampling technique, J. Artif. Int. Res. 16 (1) (2002) 321–357.
sourcing system atop open blockchain, 2018, CoRR abs/1803.01256, arXiv:
[31] G.E.A.P.A. Batista, R.C. Prati, M.C. Monard, A study of the behavior of several
1803.01256.
methods for balancing machine learning training data, SIGKDD Explor.
[9] M. Li, J. Weng, A. Yang, W. Lu, Y. Zhang, L. Hou, J. Liu, Y. Xiang, R.H. Deng,
Newsl. 6 (1) (2004) 20–29, http://dx.doi.org/10.1145/1007730.1007735.
CrowdBC: A blockchain-based decentralized framework for crowdsourcing,
[32] M.E. Tipping, C.M. Bishop, Probabilistic principal component analysis, J. R.
IEEE Trans. Parallel Distrib. Syst. 30 (6) (2019) 1251–1266, http://dx.doi.
Stat. Soc. Ser. B Stat. Methodol. 61 (3) (1999) 611–622, http://dx.doi.org/
org/10.1109/TPDS.2018.2881735.
[10] M. Kadadha, H. Otrok, R. Mizouni, S. Singh, A. Ouali, SenseChain: A 10.1111/1467-9868.00196.
blockchain-based crowdsensing framework for multiple requesters and [33] B. Xue, M. Zhang, W.N. Browne, X. Yao, A survey on evolutionary com-
multiple workers, Future Gener. Comput. Syst. 105 (2020) 650–664, http: putation approaches to feature selection, IEEE Trans. Evol. Comput. 20 (4)
//dx.doi.org/10.1016/j.future.2019.12.007. (2016) 606–626, http://dx.doi.org/10.1109/TEVC.2015.2504420.
[11] M. Kadadha, R. Mizouni, S. Singh, H. Otrok, A. Ouali, Abcrowd an auction [34] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32, http://dx.
mechanism on blockchain for spatial crowdsourcing, IEEE Access 8 (2020) doi.org/10.1023/A:1010933404324.
12745–12757, http://dx.doi.org/10.1109/ACCESS.2020.2965897. [35] T. Han, D. Jiang, Q. Zhao, L. Wang, K. Yin, Comparison of random forest,
[12] D. Chatzopoulos, S. Gujar, B. Faltings, P. Hui, Privacy preserving artificial neural networks and support vector machine for intelligent
and cost optimal mobile crowdsensing using smart contracts on diagnosis of rotating machinery, Trans. Inst. Meas. Control 40 (8) (2018)
blockchain, 2018, CoRR abs/1808.04056, http://arxiv.org/abs/1808.04056 2681–2693, http://dx.doi.org/10.1177/0142331217708242.
[arXiv:1808.04056]. [36] B. Boehmke, B. Greenwell, Hands-on Machine Learning with R, 2019,
[13] J. An, H. Yang, X. Gui, W. Zhang, R. Gui, J. Kang, TCNS: Node selection http://dx.doi.org/10.1201/9780367816377.
with privacy protection in crowdsensing based on twice consensuses of [37] R. Caruana, A. Niculescu-Mizil, An empirical comparison of supervised
blockchain, IEEE Trans. Netw. Serv. Manag. (2019) 1, http://dx.doi.org/10. learning algorithms, in: Proceedings of the 23rd International Conference
1109/TNSM.2019.2920001. on Machine Learning, in: ICML ’06, Association for Computing Machinery,
[14] Y. Wu, S. Tang, B. Zhao, Z. Peng, BPTM: Blockchain-based privacy- New York, NY, USA, 2006, pp. 161–168, http://dx.doi.org/10.1145/1143844.
preserving task matching in crowdsourcing, IEEE Access 7 (2019) 1143865.
45605–45617. [38] X. Solé, A. Ramisa, C. Torras, Evaluation of random forests on large-scale
[15] M. Kadadha, H. Otrok, S. Singh, R. Mizouni, A. Ouali, Two-sided pref- classification problems using a bag-of-visual-words representation, 269,
erences task matching mechanisms for blockchain-based crowdsourcing, 2014, pp. 273–276, http://dx.doi.org/10.3233/978-1-61499-452-7-273.
J. Netw. Comput. Appl. 191 (2021) 103155, http://dx.doi.org/10.1016/ [39] k. nationsonline.org, International airport codes - IATA 3-letter code for
j.jnca.2021.103155, URL https://www.sciencedirect.com/science/article/pii/ airports - nations online project, 2021, Nationsonline.Org, URL https://
S1084804521001697. www.nationsonline.org/oneworld/IATACodes/airportcodelist.htm.
180
M. Kadadha, H. Otrok, R. Mizouni et al. Future Generation Computer Systems 136 (2022) 170–181
[40] D. Funder, R. Colvin, Explorations in behavioral consistency: Properties Rabeb Mizouni is an associate professor in Electrical
of persons, situations, and behaviors, J. Personal. Soc. Psychol. 60 (1991) and Computer Engineering at Khalifa University of Sci-
773–794, http://dx.doi.org/10.1037/0022-3514.60.5.773. ence and Technology. She got her Ph.D. and her M.Sc.
[41] R. Furr, D.C. Funder, Situational similarity and behavioral consis- in Electrical and Computer Engineering from Concordia
tency: Subjective, objective, variable-centered, and person-centered ap- University, Montreal, Canada in 2007 and 2002 respec-
proaches, J. Res. Personal. 38 (5) (2004) 421–447, http://dx.doi.org/10. tively. Currently, she is interested in the deployment
of context aware mobile applications, crowd sensing,
1016/j.jrp.2003.10.001, URL https://www.sciencedirect.com/science/article/
software product line and cloud computing.
pii/S0092656603001107.
181